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Abstract 

This thesis will be concerned with distributed control and coordination of networks 
consisting of multiple, potentially mobile, agents. This is motivated mainly by the 
emergence of large scale networks characterized by the lack of centralized access to 
information and time-varying connectivity. Control and optimization algorithms de- 
ployed in such networks should be completely distributed, relying only on local ob- 
servations and information, and robust against unexpected changes in topology such 
as link failures. 

We will describe protocols to solve certain control and signal processing prob- 
lems in this setting. We will demonstrate that a key challenge for such systems is 
the problem of computing averages in a decentralized way. Namely, we will show 
that a number of distributed control and signal processing problems can be solved 
straightforwardly if solutions to the averaging problem are available. 

The rest of the thesis will be concerned with algorithms for the averaging problem 
and its generalizations. We will (i) derive the fastest known averaging algorithms in a 
variety of settings and subject to a variety of communication and storage constraints 

(ii) prove a lower bound identifying a fundamental barrier for averaging algorithms 

(iii) propose a new model for distributed function computation which reflects the 
constraints facing many large-scale networks, and nearly characterize the general 
class of functions which can be computed in this model. 

Thesis Supervisor: John N. Tsitsiklis 

Title: Clarence J Lebel Professor of Electrical Engineering 
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Chapter 1 
Introduction 



This thesis is about certain control and signal processing problems over networks with 
unreliable communication links. Some motivating scenarios are: 

a. Distributed estimation: a collection of sensors are trying to estimate an un- 
known parameter from observations at each sensor. 

b. Distributed state estimation: a collection of sensors are trying to estimate the 
(constantly evolving) state of a linear dynamical system from observations of 
its output at each sensor. 

c. Coverage control: A group of robots wish to position themselves so as to opti- 
mally monitor an environment of interest. 

d. Formation control: Several UAVs or vehicles are attempting to maintain a for- 
mation against random disturbances to their positions. 

e. Distributed task assignment: allocate a collection of tasks among agents with 
individual preferences in a distributed way. 

f. Clock synchronization. A collection of clocks are constantly drifting apart, and 
would like to maintain a common time as much as this is possible. Various pairs 
of clocks can measure noisy versions of the time offsets between them. 

We will use "nodes" as a common word for sensors, vehicles, UAVs, and so on. A key 
assumption we will be making is that the communication links by means of which the 
nodes exchange messages are unreliable. A variety of simple and standard techniques 
can be used for any of the above problems if the communication links never fail. By 
contrast, we will be interested in the case when the links may "die" and "come online" 
unpredictably. We are interested in algorithms for the above problems which work 
even in the face of this uncertainty. 

It turns out that a key problem for systems of this type is the problem of com- 
puting averages, described next. Nodes 1, . . . ,n each begin with a real number Xj. 
We are given a discrete sequence of times t = 1,2,3, ... , and at each time step a 
communication graph G(t) = ({1, . . . , n}, E{t)) is exogenously provided by "nature" 
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determining which nodes can communicate: node i can send messages to node j at 
time t if and only if G E{t). For simphcity, no restrictions are placed on the 
messages nodes can send to each other, and in particular, the nodes may broadcast 
their initial values to each other. The nodes need to compute (1/n) XlILi subject 
to as few assumptions as possible about the communication sequence G{t). We will 
call this the averaging problem, and we will call any algorithm for it an averaging 
algorithm. 

The averaging problem is key in the sense that a variety of results are available 
describing how to use averaging algorithms to solve many other distributed problems 
with unreliable communication links. In particular, for each of the above problems 
(distributed estimation, distributed state estimation, coverage control, formation con- 
trol, clock synchronization), averaging algorithms are a cornerstone of the best cur- 
rently known solutions. 

The remainder of this chapter will begin by giving a historical survey of averaging 
algorithms, followed by a list of the applications of averaging, including the problems 
on the previous page. 

1.1 A history of averaging algorithms 

The first paper to introduce distributed averaging algorithms was by DeGroot 
DeGroot considered a simple model of "attaining agreement:" n individuals are on 
a team or committee and would like to come to a consensus about the probability 
distribution of a certain parameter 6. Each individual i begins with a probability 
distribution Fi which they believe is the correct distribution of 6. For simplicity, we 
assume that 6 takes on a value in some finite set Q, so that each Fi can be described 
by numbers. 

The individuals now update their probability distributions as a result of interacting 
with each other. Letting Fi{t) be the distribution believed by i at time t, the agents 
update as Fi(t + 1) = with the initialization -Fj(O) = Fj. Here, are 

weights chosen by the individuals. Intuitively, people may give high weights to a 
subset of the people they interact with, for example if a certain person is believed to 
be an expert in the subject. On the other hand, some weights may be zero, which 
corresponds to the possibility that some individual ignore each other's opinions. It is 
assumed, however, that all the weights aij are nonnegative, and an, . . . , aj„ add up 
to 1 for every i. Note that the coefficients independent of time, corresponding 

to a "static" communication pattern: individuals do not change how much they trust 
the opinions of others. 

DeGroot gave a condition for these dynamics to converge, as well as a formula 
for the limiting opinion; subject to some natural symmetry conditions, the limiting 
opinion distribution will equal the average of the initial opinion distributions F^. 
DeGroot's work was later extended by Chatterjee and Seneta [30] to the case where 
the weights aij vary with time. The paper [30] gave some conditions on the time- 
varying sequence aij{t) required for convergence to agreement on a single distribution 
among the individuals. 
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The same problem of finding conditions on ajj(t) necessary for agreement was 
addressed in the works [HH [H21 [E], which were motivated by problems in parallel 
computation. Here, the problem was phrased slightly differently: n processors each 
begin with a number Xi stored in memory, and the processors need to agree on a single 
number within the convex hull [miuj Xj, maxj Xj]. This is accomplished by iterating as 
Xi{t+1) = aijXj{t). This problem was a subroutine of several parallel optimization 
algorithms [HI]. It is easy to see that it is equivalent to the formulation in terms of 
probability distributions addressed by DeGroot [3S] and Chatterjee and Seneta [5U] . 

The works [HI [92l [H] gave some conditions necessary for the estimates Xi(t) 
to converge to a common value. These were in the same spirit as [30], but were 
more combinatorial than [30] being expressed directly in terms of the coefficients 
aij{t). These conditions boiled down to a series of requirements suggesting that the 
agents have repeated and nonvanishing influence over each other. For example, the 
coefficients aij{t) should not be allowed to decay to zero, and the graph sequence G{t) 
containing the edges (z, j) for which ajiit) > needs to be "repeatedly connected." 

Several years later, a similar problem was studied by Cybenko [35] motivated by 
load balancing problems. In this context, n processors each begin with a certain 
number of jobs Xj. The variable only be an integer, but assuming a large 

number of jobs in the system, this assumption may be dispensed with. The processors 
would like to equalize the load. To that end, they pass around jobs: processors with 
many jobs try to offload their jobs on their neighbors, and processors with few job 
ask for more requests from their neighbors. The number of jobs of processor i behave 
approximately as Xi(t + 1) = Xi(t) + J2j ^iji^ji't) " ^iit)), which subject to some 
conditions on the coefficients a^j, may be viewed as a special case of the iterations 
considered in [201 ED [S] • 

Cybenko showed that when the neighborhood structure is a hypercube (i.e., we 
associate with each processor a string of log n bits, and 7^ whenever processors 

i and j differ by at most 1 bit), the above processes may converge quite fast: for some 
natural simple processes, an appropriately defined convergence time is on the order 
of log n iterations. 

Several years later, a variation on the above algorithms was studied by Vicsek 
et al. [HI]. Vicsek et al. simulated the following scenario: n particles were placed 
randomly on a torus with random initial direction and constant velocity. Periodically, 
each particle would try to align its angle with the angles of all the particles within 
a certain radius. Vicsek et al. reported that the end result was that the particles 
aligned on a single direction. 

The paper [S2] provided a theoretical justification of the results in [SI] by proving 
the convergence of a linearized version of the update model of [91]. The results of 
[52] are very similar to the results in [HI |92] , modulo a number of minor modifica- 
tions. The main difference appers to be that [91} [92] makes certain assumptions on 
the sequence G{t) that are not made in [52]; these assumptions, however, are never 
actually used in the proofs of [HIl 122] • We refer the reader to [13] for a discussion. 

The paper [52] has created an explosion of interest in averaging algorithms, and 
the subsequent literature expanded in a number of directions. It is impossible to give 
a complete account of the literature since [52] in a reasonable amount of space, so we 
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give only a brief overview of several research directions. 

Convergence in some natural geometric settings. One interesting direction of 
research has been to analyze the convergence of some plausible geometric processes, 
for which there is no guarantee that any of the sufficient conditions for consensus (e.g. 
from [92] or [52]) hold. Notable in this direction was [34], which proved convergence 
of one such process featuring all-to-all communication with decaying strength. In a 
different direction, [271 [26] give tight bounds on the convergence of averaging dynamics 
when the communication graph corresponds to nearest neighbors of points in R^. 

General conditions for averaging. A vast generalization of the consensus con- 
ditions in [92] and [52] was given in [73]. Using set- valued generalizations of the 
Lyapunov functions in [92| [52] , it was shown that a large class of possible nonlinear 
maps lead to consensus. Further investigation of these results was given in [2] and 

Quantized consensus. The above models assume that nodes can transmit real 
numbers to each other. It is natural to consider quantized versions of the above 
schemes. One may then ask about the tradeoffs between storage and performance. 
A number of papers explored various aspects of this tradeoffs. In [51], a simple ran- 
domized scheme for achieving approximate averaging was proposed. Further research 
along the same lines can be found in |103j and [IQ] . A dynamic scheme which allows 
us to approximately compute the average as the nodes communicate more and more 
bits with each other can be found in [2T] . 

We will also consider this issue in this thesis, namely in Chapters [71 and [H which 
are based on the papers [76] and [50], respectively. In Chapter [71 we will give a recipe 
for quantizing any linear averaging scheme to compute the average approximately. In 
Chapter [HI we will consider the problem of computing the averaging approximately 
with a deterministic algorithm, when each node can store only a constant number of 
bits for each link it maintains. 

Averaging with coordinates on geometric random graphs. Geographic ran- 
dom graphs are common models for sensor networks. It is therefore of interest to try 
to specialize results for averaging to the case of geometric random graphs. Under the 
assumption that every node knows its own exact coordinates, an averaging algorithm 
with a lower than expected averaging cost was developed in [37] . Further research in 
[TUl [23] reduced the energy cost even further. Substantial progress towards removing 
the assumption that each node knows its coordinates was recently made in [50] . 

Design of fast averaging algorithms on fixed graphs. It is interesting to con- 
sider the fastest averaging algorithm for a given, fixed graph. It is hoped that an 
answer would give some insight into averaging algorithms: what their optimal speed 
is, how it relates to graph structure, and so on. In [97], the authors showed how to 
compute optimal symmetric linear averaging algorithms with semidefinite program- 
ming. Some further results for specific graphs were given in [HI [19] . Optimization 
over a larger class of averaging algorithms was consider in [SS] and also in [^S] • 
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Analysis of social networks. We have already mentioned the work of DeGroot [36] 
which was aimed at modeling the interactions of individual via consensus- like updates. 
A number of recent works has taken this line of analysis further by analyzing how 
the combinatorial structure of social networks affects the outcome. In particular, [17] 
studied how good social networks are at aggregating distributed information in terms 
of various graph-cut related quantities. The recent work [1] quantified the extent to 
which "forceful" agents which are not infiuenced by others interefere with information 
aggregation. 

1.2 Applications of averaging 

We give an incomplete list of the use of averaging algorithms in various applications. 

a. Consider the following distributed estimation problem: a sensor network would 
like to estimate some unknown vector of parameters. At a discrete set of times, 
some sensors make noise-corrupted measurements of a linear function of the 
unknown vector. The sensors would like to combine the measurements that are 
coming in into a maximum likelihood estimate. 

We discuss a simpler version of this problem in the following chapter, and 
describe some averaging-based algorithms. In brief, other known techniques 
(flooding, fusion along a spanning tree) suffer from either high storage require- 
ments or lack of robustness to link failures. The use of averaging-based algo- 
rithms allows us to avoid these downfalls, as we will explain in Chapter [2j For 
some literature on this subject, we refer the reader to [981 [99| and [21]. 

b. Distributed state estimation: a collection of sensors are trying to estimate the 
(constantly evolving) state of a linear dynamical system. Each sensor is able to 
periodically make a noise corrupted measurement of the system output. The 
sensors would like to cooperate on synthesizing a Kalman filter estimate. 

There are a variety of challenges involving in such a problem, not least of which 
is the delay involved in a receiving at one node the measurements from other 
nodes which are many hops away. Several ideas have been presented in the 
literature for solving this sort of problem based on averaging algorithms. We 
refer the reader to [SH ESI |231 EH] 

c. Coverage control is the problem of optimally positioning a set of robots to 
monitor an area. A typical case involves a polygon-shaped area along with 
robots which can measure distances to the boundary as well as to each other. 
Based on these distances, it is desirable to construct controllers which cover 
the entire area, yet assign as little RrGcl clS possible to each robot. A common 
addition to this setup involves associating a number /(x) to each point x in the 
polygon, representing the importance of monitoring this point. The robots then 
optimize a corresponding objective function which weights regions according to 
the importance of the points in them. 
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It turned out that averaging algorithms have proven very useful in designing 
distributed controllers for such systems. We refer the reader to for the 
connection between distributed controllers for these systems and averaging over 
a certain class of graphs defined by Voronoi diagrams. A similar approach 
was adopted in [89]. Note, also, the related paper [68] and the extension to 
nonuniform coverage in |69j . 

d. Formation control is the problem of maintain a set formation, defined by a col- 
lection of relative distances, against random or adversarial disturbances. Every 
once in a while pairs of agents manage to measure the relative offset between 
them. The challenge is for the agents to use these measurements and take 
actions so tht in the end everyone gets into formation. 

We discuss this problem in greater detail in Chapter [21 where we explain several 
formation control ideas originating in [83]. Averaging theorems can show the 
possibility of working formation control in this setting, subject only to very 
intermittent communication. 

e. The task assignment problem consists in distributing a set of tasks among a 
collection of agents. This arises, for example, in the case of a group of aircraft or 
robots who would like to make decisions autonomously without communication 
with a common base. A typical example is dividing a list of locations to be 
monitored among a group of aircraft. 

In such cases, various auction based methods are often used to allocate tasks. 
Averaging and consensus algorithms provide the means by which these auctions 
are implemented in a distributed way; we refer the reader to the papers [291 IM] 
for details. 

f. Consider a collection of clocks which are constantly drifting apart. This is a 
common scenario, because clocks drift randomly depending on various factors 
within their environment (e.g. temperature), and also because clocks have a 
(nonzero) drift relative to the "true" time. Maintaining a common time as 
much as possible is important for a number of estimation problems (for example, 
direction of arrival problems). 

An important problem is to design distributed protocols to keep the clocks 
synchronized. These try to keep clock drift to a minimum, at least between 
time periods when at outside source can inform each node of the correct time. 
This problem has a natural similarity to averaging, except that one does not 
care very much getting the average right, but rather agreement on any time will 
do. Moreover, the constantly evolving times present a further challenge. 

A natural approach, explored in some of the recent literature, is to adopt averag- 
ing techniques to work in this setting. We refer the reader to [871 1221 IHl SH llOlj . 
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1.3 Main contributions 



This thesis is devoted to the analysis of the convergence time of averaging schemes 
and to the degradation in performance as a result of quantized communication. What 
follows is a brief summary of our contributions by chapter. 

Chapters [2] and [3] are introductory. We begin with Chapter [2] which seeks to 
motivate the practical use of averaging algorithms. We compare averaging algorithms 
to other ways of aggregating information, such as flooding and leader-election based 
methods, and discuss the various advantages and disadvantages. Our main point is 
that schemes based on distributed averaging posess two unique strenghts: robustness 
to hnk failures and economical storage requirements at each node. 

Next, in Chapter [31 we discuss the most elementary known results on the conver- 
gence of averaging methods. The rest of this thesis will be spent on improving and 
refining the basic results in this chapter. 

In Chapter Hlwe give an exposition of the first polynomial-time convergence bound 
on the convergence time of averaging algorithms. Previously known bounds, such as 
those described in Chapter |3l took exponentially many steps in the number of nodes 
n to converge, in the worst case. In the subsequent Chapter [5], we give an averaging 
algorithm whose convergence time scales as 0{n'^) steps on nearly arbitrary time- 
varying graph sequences. This is the currently best averaging algorithm in terms of 
convergence time bounds. 

We next wonder if it is possible to design averaging algorithms which improve on 
this quadratic scaling. In Chapter El we prove that it is in fact impossible to beat 
the time steps bound within a large class of (possibly nonlinear) update schemes. 
The schemes we consider do not exhaust all possible averaging algorithms, but they 
do encompass the majority of averaging schemes proposed thus far in the literature. 

We then move on to study the effect of quantized communication and storage. 
Chapter [7] gives a recipe for quantizing any linear averaging scheme. The quantization 
performs averaging while storing and transmitting only c log n bits. It is shown that 
this quantization preserves the convergence time bounds of the scheme, and moreover 
allows one to compute the average to any desired accuracy: by picking c large (but 
not dependent on n), one can make the final result be as close to the average as 
desired. 

In Chapter [HI we investigate whether it is possible to push down the log n storage 
down even futher; in particular, we show how to the average may be approximately 
with a deterministic algorithm in which each node stores only a constant number of 
bits per every connection it maintains. An algorithm for fixed graphs is given; the 
dynamic graph case remains an open question. 

Finally, Chapter [HI tackles the more general question: which functions can be 
computed with a decentralized algorithm which uses a constant number of bits per 
link? The chapter assumes a consensus-like termination requirement in which the 
nodes only have to get the right answer eventually, but are not required to know 
when they have done so. The main result is a nearly tight characterization of the 
functions which can be computed deterministically in this setting. 
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Chapter 2 
Why averaging? 



Our goal in this chapter is to motivate the study of distributed averaging algorithms. 
We will describe two settings in which averaging turns out to be singularly useful. 
The first is an estimation problem in a sensor network setting; we will describe an 
averaging-based solution which avoids the pitfalls which plague alternative schemes. 
The second is a formation maintenance problem; we will show how basic theorems 
on averaging allow us to establish the satisfactory performance of some formation 
control schemes. 

2.1 A motivating example: distributed estimation 
in sensor networks 




Figure 2-1: The set of online links at some given time. 
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Consider a large collection of sensors, 1, . . . , n, that want to estimate an unknown 
parameter 9 & . Some of these sensors are able to measure a noise corrupted 
version of 6] in particular, all nodes i in some subset S* C {1, . . . , n} measure 

Xi = 9 + Wi. 

We will assume, for simplicity, that that the noises Wi are jointly Gaussian and inde- 
pendent at different sensors. Moreover, only node i knows the statistics of its noise 

It is easy to see that the maximum likelihood estimate is given by 

Note that if S = {l,...,n} (i.e. every node makes a measurement), and all the 
variances af are equal, the maximum likelihood estimate is just the average 6 = 

The sensors would like to compute ^ in a distributed way. We do not assume 
the existence of a "fusion center" to which the sensors can transmit measurements; 
rather, the sensors have to compute the answer by exchanging messages with their 
neighbors and performing computations. 

The sensors face an additional problem: there are communication links available 
through which they can exchange messages, but these links are unreliable. In par- 
ticular, links fail and come online in unpredictable ways. For example, there is no 
guarantee that any link will come online if the sensors wait long enough. It is pos- 
sible for a link to be online for some time, and then fail forever. Figure 12.11 shows 
an example of what may happen: at any given time, only some pairs of nodes may 
exchange messages, and the network is effectively split into disconnected clusters. 

More concretely, we will assume a discrete sequence of times t = 1,2,3, ... , during 
which the sensors may exchange messages. At time t, sensor i may send a message to 
its neighbors in the undirected graph G(t) = ({1, . . . , n}, E(t)). We will also assume 
that the graph G{t) includes all self loops The problem is to devise good 

algorithms for computing 6, and to identify minimal connectivity assumptions on the 
sequence G{t) under which such a computation is possible. 

2.1.1 Flooding 

We now describe a possible answer. It is very plausible to make an additional as- 
sumption that sensors possess unique identifiers; this is the case in almost any wireless 
system. The sensors can use these identifiers to "flood" the network so that eventually, 
every sensor knows every single measurement that has been made. 

At time 1, sensor i sends its own triplet {id, Xi,af) to each of its neighbors. Each 
sensor stores all the messages it has received. Moreover, a sensor maintains a "to 
broadcast" queue, and each time it hears a message with an id it has not heard before, 
it adds it to the tail of the queue. At times t = 2,3, . . ., each sensor broadcasts the 
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top message from its queue. 

If G{t) is constant with time, and connected, then eventually each sensor learns 
all the measurements that have been made. Once that happens, each sensor has all 
the information it needs to compute the maximum likelihood estimate. Moreover, 
the sensors do not even need to try to detect whether they have learned everything; 
each sensor can simply maintain an estimate of 6, and revise that estimate each time 
it learns of a new measurement. 

If G{t) is not fixed, flooding can still be expected to work. Indeed, each time a 
link appears, there is opportunity for a piece of information to be learned. One can 
show that subject to only very minimal requirements on connectivity, every sensor 
eventually does learn every measurement. 

Let us state a theorem to this effect. We will use the notation Ut^xG{t) to mean 
the graph obtained by forming the union of the edge sets of G(t),t G X, i.e. 



is connected for every t. 

In words, this assumption says that the graph sequence G(t) has enough edges 
for connectivity, and that moreover this remaisn true after some finite set of graphs 
is removed from the sequences. 

Theorem 2.1. If Assumption \2J\ holds, then under the flooding protocol every node 
eventually learns each triplet (id, Xj, 0"^^). 

Proof. (Sketch). Suppose that some triplet (id, Xj, af) is not learned by node j. Let A 
be the nonempty set of nodes that do learn this triplet; one can easily argue that the 
number of edges in the graphs E{t) between A and A"^ is finite. But this contradicts 
Assumption 12.11 □ 

It is possible to relax the assumption of this theorem: U<j>tG(s) actually only 
needs to be connected for a sufficiently long but finite time interval. 

The problem with flooding, however, lies with its storage requirements: each 
sensor needs to store n pieces of information, i.e., it needs to store a list of id's whose 
measurements it has already seen. This means that the total amount of storage 
throughout the network is at least on the order of n^. 

We count the storage requirements of the numbers constant. When 

dealing with estimation problems, it is convenient to assume that Xi^af are real 




A relatively light connectivity assumption is the following. 



Assumption 2.1. (Connectivity) The graph 



U.>tG(s) 
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numbers, and we will maintain this technical assumption for now. Nevertheless, in 
practice, these numbers will be truncated to some fixed number of bits (independent 
of n). Thus it makes sense to think of transmitting each of the Xj, af as incurring a 
fixed cost independent of n. 

Thus tracing out the dependence on n, we have that at least bits must be 
stored, in addition to a fixed number of bits at each node to maintain truncated 
versions of Xj, af and the estimate 6. One hopes for the existence of a scheme whose 
storage requirements scale more gracefully with the number of nodes n. 

2.1.2 Leader election based protocols 

The protocol we outline next has considerably nicer storage requirements. On the 
other hand, it will require some stringent assumptions on connectivity. We describe 
it next for the case of a fixed communication graph, i.e., when the graph sequence 
G{t) does not depend on t. 

First, the sensors elect one of them as a leader. There are various protocols for 
doing this. If sensors have unique identifiers they may pick (in a distributed way) the 
sensor with the largest or smallest id as the leader. Even in the absence of identifiers, 
there are randomized protocols for leader election (see [1]) which take on the order of 
the diametei0 of the network time steps, provided that messages of size O(logn) bits 
can be sent at each time step. 

Next, the sensors can build a spanning tree with the leader as the root. For 
example, each sensor may pick as its parent the node one hop closer to the root. 
Finally, the sensors may forward all of their information (i.e., their {xi,af)) to the 
root, which can compute the answer and forward it back. 

Our description of the algorithm is deliberately vague, as the details do not par- 
ticularly matter (e.g., which leader election algorithm is used). We would like to 
mention, however, that it is even possible to avoid having the leader learn all the in- 
formation [xi, af). For example, once a spanning tree is in place, each node may wait 
to hear from all of its children, and then forward to the leader a sufficient statistic 
for the measurements in its subtree. 

We state the existence of such protocols as a theorem: 

Theorem 2.2. If G{t) does not depend on time, i.e., G{t) = G for all t, it is possible 
to compute 9 with high probability in 0{d{G)) time steps. The nodes need to store and 
forward messages of size 0(log?7,) bits in each time step, as well a constant number 
of real numbers which are smooth functions of the measurements Xj, af. 

Observe that that the theorem allows for the existence of protocols which (only) 
work with high probability; this is due to the necessarily randomized nature of leader 
election protocols in the absence of identifiers. 

The above theorem is nearly optimal for any fixed graph. In other words, if the 
communication links are reliable, there is no reason to choose anything but the above 
algorithm. 

^The diameter of the graph G, denoted by d{G), is the largest distance between any two nodes. 
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On the other hand, if the graph sequence G{t) is changing unpredictably, this sort 
of approach immediately runs into problems. Maintaining a spanning tree appears 
to be impossible if the graph sequence changes dramatically from step to step, and 
other approaches are needed. 



2.2 Using distributed averaging 

We now describe a scheme which is both robust to link failures (it only needs Assump- 
tion [211] to work) and has nice storage requirements (only requires nodes to maintain 
a constant number of real numbers which are smooth functions of the data Xi,af). 
However, it needs the additional assumption that the graphs G{t) are undirected. 
This condition is often satisfied in practice, for example if the sensors are connected 
by links whenever they are within a certain distance of each other. 

First, let us introduce some notation. Let Ni{t) be the set of neighbors of node i in 
the graph G{t). Recall that we assume self-arcs are always present, i.e., G E(t) 
for all i,t, so that i e iVj(t) for all i,t. Let di{t) be the degree of node i in G{t). 

Let us first describe the scheme for the case where S = {1, . . . , n}, i.e., every node 
makes a measurement, and all af are the same. In this case, the maximum likelihood 
estimate is just the average of the numbers xf. 9 = (l/n) Yl^=i ^i- 

The scheme is as follows. Each node sets Xi{0) = Xi and updates as 

Xi{t + 1)= J2 (^vit^jit)^ (2-1) 



where 



a,(t) = niin(^,^), for j e N,{t), j ^ ^ 
= 0, otherwise. 



aii{t) = 1 - 

jeN,{t) 



Then: 



Proposition 2.1. If Assumption l2J\ holds and all the graphs G(t) are undirected, 
then 

1 

lim Xi{t) = — > Xi = 6. 

t— >oo fl — ' 

i=l 

The above proposition is true because it is a special case of the following theorem: 
Theorem 2.3. Consider the iteration 

x{t + 1) = A{t)x{t) 

where: 
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1. A{t) are doubly stochasti^ matrices. 



2. If{i,j) G G(t), then aij > andaji > 0, and if{i,j) ^ G(t), then aij = aji = 0. 

3. There is some rj > such that if a.ij > then aij > rj. 

4. The graph sequence G{t) is undirected. 

5. Assumption \2.1\ on the graph sequence G{t) holds. 
Then: 

1 " 

lim xAt) = — 3^7(0), 

for all i. 



t^oo n 



We will prove this theorem in Chapter [31 In this form, this theorem is a trivial 
modification of the results in [HTJ |25l UM HHl IE] which themselves are based on the 
earlier results in [32] • 

Accepting the truth of this theorem for now, we can conclude that we have de- 
scribed a simple way to compute 9. Every node just needs to store and update a single 
real number Xi{t). Nevertheless, subject to only the weak connectivity Assumption 
12. H every Xi approaches the correct 6. This scheme thus manages to avoid the down- 
sides that plague flooding and leader election (high storage, lack of robustness to link 
failures) . 

Let us describe next how to use this idea in the general case where 5* is a proper 
subset of {!,..., n} and the af are not all equal. Each node sets Xj(0) = Xi/crf, 
yi{0) = l/<jf, and ^^^(O) = Xj(0)/|/j(0). If the node did not make a measurement, it 
sets Xi{0) = yi{0) = 0, and leaves Zi{0) undefined. Each node updates as 

x^{t+l) = 

jeAf,(t) 

yi{t+l) = ^ a,j{t)yj{t) 

Xi{t) 



Z^{t+l 

Observe that and we have: 



Proposition 2.2. // Assumption l2J\ holds, th 



en 



lim Zi(t) = 9. 



■^A matrix is called doubly stochastic if it is nonncgative and all of its rows and columns add up 
to 1. 
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Proof. By Theorem 12.1 



t—>-oo 



and so 



lim zJt) = ^^^^^ = ^i^s^^/^f ^ I 



□ 



The punchhne is that even this somewhat more general problem can be solved by 
an algorithm that relies on Theorem 12.11 This solution has nice storage requirements 
(nodes store only a constant number of real numbers which are smooth functions of 
Hi, af) and is robust to link failures. 



2.3 A second motivating example: formation con- 
trol 

We now give another application of Theorem 12. ![ this time to a certain formation 
control problem. Our exposition is based on [83] . 

Suppose that the nodes have real positions Xi{t) in i?*^; the initial positions XiiQi) 
are arbitrary, and the nodes want to move into a formation characterized by positions 
Pi, . . . in i?'^ (formations are defined up to translation). This formation is uniquely 
characterized by the offset vectors = Pi — Pj. We assume that at every t, various 
pairs of nodes i,j succeed in measuring the offsets Xi{t) — Xj{t). Let E{t) be the set 
of (undirected) edges corresponding to the measurements. The problem is how 
to use these intermittent measurements to get into the desired formation. 

Let us assume for simplicity that our Xi{t) lie in R (we will dispense with this as- 
sumption shortly). A very natural idea is to perform gradient descent on the function 

This leads to the following control law: 

x,(t + l) =Xi(t) -2A {x^{t) - Xj{t)) + 2A ^ Tij, (2.2) 

where A is the stepsize. Essentially, every node repeatedly "looks around" and moves 
to a new position depending on the positions of its neighbors and its desired offset 
vectors r,o. 
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Defining bi{t) = 2 A X^jgjVi(t) '^jj' above equations may be rewritten as 

x{t + l) = A{t)x{t) + b{t). 
Now let z be any translate of the given positions (pi, . . . ,Pn)- Then, z satisfies 

z = A{t)z + b{t), 

because the gradient at Xi = Zi equals 0. Subtracting the two equations we get 

x{t + I) - z = A{t){x{t) - z). 

Observe that if A < l/(2n), then the above matrix is nonnegative, symmetric, and 
has rows that add up to 1. Applying now Theorem 12 ■![ we get the following statement: 

Proposition 2.3. Suppose that the nodes implement the iteration of Eq. Ii2.2^) . If: 

1. z! is the translate of p whose average is the same as the average of Xi{0). 

2. The communication graph sequence G{t) satisfies Assumption 1 (connectivity). 

3. A< l/{2n) 
Then, 

lim Xi{t) = z[, 

for all i. 

The proof is a straightforward application of Theorem I2.1[ This theorem tells us 
that subject to only minimal conditions on connectivity, the scheme of Eq. (12. 2p will 
converge to the formation in question. 

We remark that we can replace the assumption that the Xi are real numbers with 
the assumption that the Xi belong to . In this case, we can apply the control law 
of Eq. (12.21) . which decouples along each component of Xi{t), and apply Proposition 
12.31 to each component of Xi{t). 

Finally, we note that continuous-time versions of these updates may be presented; 
see [S3|. 

2.4 Concluding remarks 

Our goal in this chapter has been to explain why averaging algorithms are useful. We 
have described averaging-based algorithms for estimation and formation problems 
which are robust to link failures and have very light storage requirements at each 
node. 

Understanding the tradeoff between various types of distributed algorithms is still 
very much an open question, and the discussion in this chapter has only scratched 
the surface of it. One might additionally wonder how averaging algorithms perform 



24 



on a variety of other dimensions: convergence time, energy expenditure, robustness 
to node failures, performance degradation with noise, and so on. 

Most of this thesis will be dedicated to exploring the question of convergence time. 
In the next chapter we will give a basic introduction to averaging algorithms, and in 
particular we will furnish a proof of Theorem 12.11 In the subsequent chapters, will 
turn to the question of designing averaging algorithms with good convergence time 
guarantees. 
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Chapter 3 

The basic convergence theorems 



Here we begin our analysis of averaging algorithm by proving some basic convergence 
results. Our main goal is to prove Theorem 12.11 from the previous chapter, as well 
as some natural variants of it. Almost all of this thesis will be spent refining and 
improving the simple results obtained by elementary means in this chapter. The 
results we will present may be found in [911 [921 [TH [521 [671 lU] • We also recommend 
the paper [73] for a considerable generalization of the results found here. The material 
presented here appeared earlier in the paper [T7| and the M.S. thesis [77] . 

3.1 Setup and assumptions 

We consider a set = {1, . . . , n} of nodes, each starting with a real number stored in 
memory. The nodes attempt to compute the average of these numbers by broadcasting 
these numbers and repeatedly combining them by forming convex combination. We 
will first only be concerned with the convergnece of this process. 

Each node i starts with a scalar value Xi{0). The vector x(t) = {xi(t), . . . ,Xn(t)) 
with the values held by the nodes at time t, is updated according to the equation 
x{t + l) = A{t)x{t), or 

n 

Xi{t + l) = Y,a^J{t)Xj{t), (3.1) 

i=i 

where A{t) is a nonnegative matrix with entries aij{t), and where the updates are 
carried out at some discrete set of times which we will take, for simplicity, to be the 
nonnegative integers. We will refer to this scheme as the agreement algorithm. 

We will assume that the row-sums of A{t) are equal to 1, so that A{t) is a stochastic 
matrix. In particular, Xi{t + 1) is a weighted average of the values Xj{t) held by the 
nodes at time t. We are interested in conditions that guarantee the convergence of 
each Xi(t) to a constant, independent of i. 

Throughout, we assume the following. 

Assumption 3.1 (non- vanishing weights). The matrix A(t) is nonnegative, stochas- 
tic, and has positive diagonal. Moreover, there exists some rj > such that ifaij{t) > 
then ttijit) > rj. 
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Intuitively, whenever aij(t) > 0, node j communicates its current value Xj{t) to 
node i. Each node i updates its own value by forming a weighted average of its own 
value and the values it has just received from other nodes. 

The communication pattern at each time step can be described in terms of a di- 
rected graph G{t) = {N, E{t)), where (j, i) G E{t) if and only if aij{t) > 0. Note that 
{i, i) G E{t) for all i, t since A{t) has positive diagonal. A minimal assumption is that 
starting at an arbitrary time t, and for any 2, j, there is a sequence of communications 
through which node i will influence (directly or indirectly) the value held by node j. 
This is Assumption 12.11 from the previous chapter. 

We note various special cases of possible interest. 

Fixed coefRcients: There is a fixed matrix A, with entries aij such that, for each 
t, and for each i j, we have aij(t) G {0} U {ciij} (depending on whether there is a 
communication from j to i at that time). This is the case presented in [T^ . 

Symmetric model: If G E(t) then G E{t). That is, whenever i commu- 
nicates to j, there is a simultaneous communication from j to i. 

Equal neighbor model: Here, 



This model is a linear version of a model considered by Vicsek et al. [HI]. Note that 
here the constant t] of Assumption 13. II is equal to 1/n. 

Metropolis model: Here, 



The Metropolis model is similar to the equal-neighbor model, but has the advantage 
of symmetry: aij{t) = ajiit). 

Pairwise averaging model ( |20] ) : This is the special case of both the symmetric 
model and of the equal neighbor model in which, at each time, there is a set of 
disjoint pairs of nodes who communicate with each other. If i communicates with j, 
then Xi(t + 1) = Xj{t+1) = (xj(t) + Xj{t))/2. Note that the sum xi(t) H — ■ + x„(t) is 
conserved; therefore, if consensus is reached, it has to be on the average of the initial 
values of the nodes. 

The assumption below is a strengthening of Assumption 12.11 on connectivity. We 
will see that it is sometimes necessary for convergence. 




lM(t), ifjGiVi(t), 
0, ifj^iV.(t), 




1 / max(di (t) , dj {t) ) , if J G iVi (t) , z 7^ j 
0, ifj^iV.(t), 



and 
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Assumption 3.2 (5-connectivity). There exists an integer B > such that the 
directed graph 

{N, E{kB) U E{{k + 1)B) U ■ ■ ■ U E{{k + 1)B - 1)) 
is strongly connected for all integer k > 0. 

3.2 Convergence results in the absence of delays. 

We say that the agreement algorithm guarantees asymptotic consensus if the following 
holds: for every x{0), and for every sequence allowed by whatever assumptions 

have been placed, there exists some c such that limf^oo Xi{t) = c, for all i. 

Theorem 3.1. Under Assumptions \3. 1\ ( non-vanishing weights) and \3.2\ (B- connectivity), 
the agreement algorithm guarantees asymptotic consensus. 

Theorem 13.11 may be found in |52]; a slightly different version is in [92l |9l]. We 
next give an informal account of its proof. 

Sketch of proof. The proof has several steps. 

Step 1: Let us define the notion of a path in the time- varying graph G{t). A path p 
from a to 6 of length / starting at time t is a sequence of edges (/cq, ki), {ki, ^2), • • • , {ki^i, ki) 
such that ko = a, ki = b, and {k^, ki) G E(t), {ki, ^ E(t + 1), . . . , and so on. We 
will use c{p) to denote the product 

i-i 

i=0 

Define 

^ti,t2) = A{t2~l)A{t2-2)---A{ti), 

and let the (z,j)'th entry of this matrix be denoted by </'jj(ti, ^2)- The following fact 
can be established by induction: 

(pi,j{ti,t2) = Yl ^^p)- 

paths p from i to j of length t2~ti starting at time h 
A consequence is that if 0ij(ti,t2) > 0, then Assumption 13.11 implies 4>ij(ti,t2) > 

Step 2: Assumptions 13. l l and 13.21 have the following implication: for any two nodes 
i, J, there is a path of length nB that begins at i and ends at j. 

This implication may be proven by induction on the following statement: for any 
node i there are at least m distinct nodes j such that there is a path of length mB 
from i to j. The proof crucially relies on Assumption 13.11 which implies that all the 
self loops {i,i) belong to every edge set E(t). 
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Step 3: Putting Steps 1 and 2 together, we get that ^{kB, {k + n)B) is a matrix 
whose every entry is bounded below by 7]^^ . The final step is to argue that 

nQn—l ' ' ' Ql-^ 

converges to a multiple of the all-ones vector 1 for any sequence of matrices Qi having 
this property and any initial vector x. This is true because for any such matrix Qi, 

ma.x{Qix)k -m.m{Qix)k < (1 -?7"^)(maxxfc -minxfc). 

□ 

In the absence of ^-connectivity, the algorithm does not guarantee asymptotic 
consensus, as shown by Example 1 below (Exercise 3.1, in p. 517 of [S]). In particular, 
convergence to consensus fails even in the special case of the equal neighbor model. 
The main idea is that the agreement algorithm can closely emulate a nonconvergent 
algorithm that keeps executing the three instructions Xi := X3, x^ := X2, X2 '■= Xi, 
one after the other. 

Example 1. Let n = 3, and suppose that x(0) = (0, 0, 1). Let ei be a small positive 
constant. Consider the following sequence of events. Node 3 communicates to node 
1; node 1 forms the average of its own value and the received value. This is repeated 
ti times, where ti is large enough so that Xi(ti) > 1 — ei. Thus, x{ti) ^ (1,0,1). 
We now let node 2 communicates to node 3, ^2 times, where ^2 is large enough so 
that 0:3(^1 -|- ^2) < Ci- In particular, x{ti + ^2) ~ (1,0,0). We now repeat the above 
two processes, infinitely many times. During the kth repetition, ei is replaced by 
(and ti,t2 get adjusted accordingly). Furthermore, by permuting the nodes at each 
repetition, we can ensure that Assumption 12.11 is satisfied. After k repetitions, it can 
be checked that x(t) will be within 1 — ei — ■ ■ ■ — of a unit vector. Thus, if we 
choose the so that YlT=i < 1/2, asymptotic consensus will not be obtained. 

On the other hand, in the presence of symmetry, the 5-connectivity Assumption 
13.21 is unnecessary. This result is proved in [67] and [25] for the special case of the 
symmetric equal neighbor model and in [73l |19], for the more general symmetric 
model. A more general result will be established in Theorem 13.41 below. 

Theorem 3.2. Under Assumptions \2. 1\ and \3.1\ and for the symmetric model, the 
agreement algorithm guarantees asymptotic consensus. 

3.3 Convergence in the presence of delays. 

The model considered so far assumes that messages from one node to another are 
immediately delivered. However, in a distributed environment, and in the presence 
of communication delays, it is conceivable that a node will end up averaging its own 
value with an outdated value of another node. A situation of this type falls within 
the framework of distributed asynchronous computation developed in [H] . 
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Communication delays are incorporated into the model as follows: when node 
i, at time t, uses the value Xj from another node, that value is not necessarily the 
most recent one, Xj(t), but rather an outdated one, Xj{Tj(t)), where < rj(t) < t, 
and where t — Tj{t)) represents communication and possibly other types of delay. In 
particular, Xi{t) is updated according to the following formula: 

n 

x,{t+l) = J2(^n{t)xj{Ti{t)). (3.2) 
j=i 

We make the following assumption on the Tj(t). 

Assumption 3.3. (Bounded delays) (a) If aij(t) = 0, then Tj{t) = t. 
(h) Tlit) = t, for all I, t. 

(c) There exists some B > such that t — B + 1 < Tj{t) < t, for all i, j , t. 

Assumption 13.3( a) is just a convention: when aij(t) = 0, the value of rj(t) has no 
effect on the update. Assumption 13.3( b) is quite natural, since an node generally has 
access to its own most recent value. Assumption 13.3( c) requires delays to be bounded 
by some constant B. 

The next result, from [911 W2\ . is a generalization of Theorem 13.11 The idea of 
the proof is similar to the one outlined for Theorem 13.11 except that we now define 
m(t) = miiii mms=t,t~i,...,t~B+i Xi{s) and M{t) = m8iXimaXs=t,t~i,...,t^B+iXi{s). For 
convenience, we will adopt the definition that Xi{t) = Xj(0) for all negative t. Once 
more, one shows that the difference M{t) — m{t) decreases by a constant factor after 
a bounded amount of time. 

Theorem 3.3. Under Assumptions \3. 1{ \3.2l \3.3\ (non-vanishing weights, bounded in- 
tercommunication intervals, and bounded delays), the agreement algorithm with delays 
[cf. Eq. Ii3. 2) ] guarantees asymptotic consensus. 

Theorem 13.31 assumes bounded intercommunication intervals and bounded delays. 
The example that follows (Example 1.2, in p. 485 of [H]) shows that Assumption 
13.3( d) (bounded delays) cannot be relaxed. This is the case even for a symmetric 
model, or the further special case where E{t) has exactly four arcs (^,^), (j, j), (^,i), 
and (j, i) at any given time t, and these satisfy ajj(t) = aji{t) = 1/2, as in the pairwise 
averaging model. 

Example 2. We have two nodes who initially hold the values Xi(0) = and ^2(0) = 1, 
respectively. Let tk be an increasing sequence of times, with to = and tk+i — tk — )■ oo- 
li tk < t < tfc+i, the nodes update according to 

Xi{t+1) = (Xi(t)+X2(tfc))/2, 
X2(t + 1) = ix,itk)+X2it))/2. 

We will then have Xi(ti) = 1 — ei and ^2(^1) = ei, where ei > can be made 
arbitrarily small, by choosing ti large enough. More generally, between time tk and 
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the absolute difference \xi(t) — X2{t)\ contracts by a factor of 1 — 2efc, where the 
corresponding contraction factors 1 — 2efc approach 1. If the are chosen so that 
^^efc < oo, then IlfcLill ~ ^e^) > 0, and the disagreement \xi(t) — X2{t)\ does not 
converge to zero. 

According to the preceding example, the assumption of bounded delays cannot be 
relaxed. On the other hand, the assumption of bounded intercommunication intervals 
can be relaxed, in the presence of symmetry, leading to the following generalization 
of Theorem 13. 2[ 

Theorem 3.4. Under Assumptions \2. 1\ (connectivity), \3.1\ (non-vanishing weights), 
and \3.3\ (bounded delays), and for the symmetric model, the agreement algorithm with 
delays [cf. Eq. h3.2) ] guarantees asymptotic consensus. 

Proof. Let 

Mi{t) = ma.x{x i{t),Xi{t 
M{t) = max Mi (t), 

i 

mi{t) = min{x i{t),Xi{t 
m{t) = minmj(t). 

i 

Recall that we are using the convention that Xi{t) = Xj(0) for all negative t. An easy 
inductive argument, as in p. 512 of [H], shows that the sequences m{t) and M{t) are 
nondecreasing and nonincreasing, respectively. The convergence proof rests on the 
following lemma. 

Lemma 3.1. If mij — B) = and M{t) = 1, then there exists a time r' > r such 
that M{r') - m{r' - B) <\-rf^ . 

Given Lemma 1, the convergence proof is completed as follows. Using the lin- 
earity of the algorithm, there exists a time T\ such that M(ri) — m(ri — B) < 
(1 — rj'^^){M{B) — m(0)). By applying Lemma 1, with r replaced by ri, and using 
induction, we see that for every k there exists a time such that M{Tk) —m{Tk~B) < 
(1— ?7"^)'^(M(i?)— m(0)), which converges to zero. This, together with the monotonic- 
ity properties of m{t) and M{t), implies that m{t) and M{t) converge to a common 
limit, which is equivalent to asymptotic consensus. □ 

Proof of Lemma \3.1\ For k = 1, . . . , ra, we say that "Property P^ holds at time t" if 
there exist at least k indices i for which mj(t) > r]^^ . 

We assume, without loss of generality, that m(r — -B) = and M(r) = 1. Then, 
m{t) > for all t > r — i? by the monotonicity of m{t). Furthermore, there exists 
some i and some t' G {r — B + 1,t — B + 2, . . . ,t} such that Xi{t') = 1. Using the 
inequality Xi{t + 1) > r]Xi{t), we obtain mi{t' + B) > t]^ . This shows that there exists 
a time at which property Pi holds. 

We continue inductively. Suppose that k < n and that Property Pk holds at some 
time t. Let 5* be a set of cardinality k containing indices i for which mi{t) > r)^^ , and 



-l),...,x,(t-5 + l)}, 
-l),...,x,(t-B + l)}, 
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let 5"^ be the complement of S. Let t' be the first time, greater than or equal to t, at 
which ttijit') 7^ 0, for some j G S* and i & S'^ (i.e., an node j in S gets to influence 
the value of an node i in S^). Such a time exists by the connectivity assumption 
(Assumption 12. ip . 

Note that between times t and t', the nodes i in the set 5* only form convex 
combinations between the values of the nodes in the set S (this is a consequence of 
the symmetry assumption). Since all of these values are bounded below by t]^^, it 
follows that this lower bound remains in effect, and that mnit') > rj^^ , for all £ G 5*. 

For times s > t', and for every £ G 5", we have Xe{s + 1) > rjXi^s), which implies 
that Xi{s) > r/'^^r/^, for s G + 1, . . . , t' + B}. Therefore, m^(t' + B) > ry(*^+i)^, for 
all ieS. 

Consider now an node i E S'^ for which aij{t') ^ 0. We have 
x^{t' + 1) > a,,(t')x,(r](t')) > vMt') > V"''-''- 

Using also the fact Xi{s + 1) > 'r]Xi{s), we obtain that mi{t' + B) > 7^(^+1)-^. Therefore, 
at time t' + B, we have k + 1 nodes with mi{t' + B) > t^C^+i)^ (namely, the nodes in 
S, together with node i). It follows that Property Pk+i is satisfied at time t' + B. 

This inductive argument shows that there is a time r' at which Property P„ is 
satisfied. At that time mj(r') > r/"^ for all i, which implies that m(r') > r/"^. On 
the other hand, M(r' + B) < M(0) = 1, which proves that M(r' + B) - rn^r') < 
1 - r]""^. □ 

Now that we have proved Theorem 13.41 let us give a variation of it which will 
ensure not only convergence but convergence to the average. 

Assumption 3.4. (Double stochasticity) The matrix A{t) is column-stochastic for 
all t, i.e., 

n 

^aijit) = 1, 

i=l 

for all j and t. 

Note that Assumption 13.11 ensures that the matrix A(t) is only row-stochastic. 
The above assumption together with Assumption 13.11 ensures that A{t) is actually 
doubly stochastic. 

Theorem 3.5. Under Assumptions \2. 1\ (connectivity), \3.1\ (non-vanishing weights), 
and \3.4\ (double stochasticity), and for the symmetric model, the agreement algorithm 
(without delays) satisfies 

1 " 

lim Xi{t) = Xi{0). 

i=\ 

Proof. By Theorem 13.41 every Xi{t) converges to the same value. Assumption 13.41 
ensures that ^2^=1 Xi{t) is preserved from iteration to iteration: 

n n 

Xi{t + 1) = l'^x{t + 1) = l^A{t)x{t) = l^x{t) = x,{t), 

i=l i=l 
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where we use the double stochasticity of A(t) to conclude that l'^A(t) = 1^. This 
immediately implies that the final limit is the average of the initial values. □ 

Finally, let us observe that Theorem 12.31 from the previous chapter is a special 
case of the theorem we have just proved. 

Proof of Theorem \2.1[ Observe that the assumptions of Theorem 13.51 are present 
among the assumptions of Theorem 12.11 except for Assumption 13.41 which needs to 
be verified. To argue that the matrices A{t) in Theorem 12.11 are doubly stochastic, 
we just observe that they are symmmetric and stochastic. □ 

3.4 Relaxing symmetry 

The symmetry condition [(i, j) G E{t) iff (j, i) G E{t)\ used in Theorem 13.41 is some- 
what unnatural in the presence of communication delays, as it requires perfect syn- 
chronization of the update times. A looser and more natural assumption is the fol- 
lowing. 

Assumption 3.5 (Bounded round-trip times). There exists some B > such that 
whenever G E{t), then there exists some r that satisfies \t — r\ < B and {j,i) G 
E{t). 

Assumption 13.51 allows for protocols such as the following. Node i sends its value 
to node j. Node j responds by sending its own value to node i. Both nodes up- 
date their values (taking into account the received messages), within a bounded time 
from receiving the other node's value. In a realistic setting, with unreliable com- 
munications, even this loose symmetry condition may be impossible to enforce with 
absolute certainty. One can imagine more complicated protocols based on an ex- 
change of acknowledgments, but fundamental obstacles remain (see the discussion of 
the "two-army problem" in pp. 32-34 of [12] )• A more realistic model would introduce 
a positive probability that some of the updates are never carried out. (A simple pos- 
sibility is to assume that each aij{t), with i 7^ j, is changed to a zero, independently, 
and with a fixed probability.) The convergence result that follows remains valid in 
such a probabilistic setting (with probability 1). Since no essential new insights are 
provided, we only sketch a proof for the deterministic case. 

Theorem 3.6. Under Assumptions \2.1\ (connectivity), IJ.il (non-vanishing weights), 
\3.3\ (delays) and \3.5\ (bounded round-trip times) the agreement algorithm with delays 
[cf. Eq. hS. 2) 1 guarantees asymptotic consensus. 

Proof outline. A minor change is needed in the proof of Lemma 1. In particular, we 
define P^ as the event that there exist at least k indices I for which mi{t) > rf^^. It 
follows that Pi holds at time t = 2B. 

By induction, let Pk hold at time t, and let S be the set of cardinality k containing 
indices / for which mi{t) > ry^fcs Furthermore, let r be the first time after time t 
that ajj(r) 7^ where exactly one of i,j is in S. Along the same lines as in the 
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proof of Lemma 1, mi{T) > rf^^ for I G S\ since Xi{t + 1) > rjxiit), it follows 
that m;(r + 2B) > 7^2(fc+i)B each / G S. By our assumptions, exactly one of 
i,j is in S'^. If i G S^, then Xj(r + 1) > ajj(r)a:j(rj(r)) > 7^2feB+i g^j^^^ consequently 
Xj(r + 2B) > 7^2B-i^2A:B+i ^ ^2(fc+i)B_ j ^ -j-j^gj^ there must exist a time 
Tj G {r + 1, r + 2, . . . , r + 5 - 1} with aji(rj) > 0. It follows that: 

mj{T + 2B) > r/"+2^"("^+i)xj(rj + 1) 
Therefore, Pfc+i holds at time r + 25 and the induction is complete. □ 



3.5 Concluding remarks 

In this chapter, we have presented some basic convergence results on the averaging 
iterations x{t+l) = A{t)x{t). In particular, we proved Theorem 12 . 31 from the previous 
chapter, as well as several variations of it involving asymmetry and delays. 

There is, however, one troubling feature of the results so far: the convergence 
time bounds which follow from our proofs are quite large. We have shown that after 
nB steps, an appropriately defined measure of convergence shrinks by a factor of 
1 — rj"'^. Considering that r] can be as small as 1/n (for example, in the Metropolis 
model), this means that one must wait n'^^nB log{l/e) steps for the same measure of 
convergence to shrink by e. It goes without saying that this is an enormous number 
even for relatively small n, B. 

One might hope that the bounds we have derived are lax. Unfortunately, one 
can actually construct examples of graph sequences on which convergence takes time 
exponential in n. An example may be found in [85]; an example with an undirected 
graph may be found in an unpublished manuscript by Cao, Spielman, and Morse. 

In the next several chapters, we will be concerned with the possibility of designing 
averaging algorithms with better guarantees. A first goal is to replace the exponential 
scaling with n by a polynomial one. 
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Chapter 4 

Averaging in polynomial time 



4.1 Convergence time 

In the previous chapter, we proved Theorem 13.51 which states that subject to a few 
natural conditions the iteration 

x{t + l) = A{t)x{t) 

resuhs in 

1 " 

1=1 

A special case of this result is Theorem 12.31 stated earlier. The assumptions include: 
Assumption 12.11 ensuring that the graphs G{t) contain enough links, Assumption 13.11 
on the weights, and the "double stochasticity" Assumption 13. 4[ Finally, we also had 
to assume we were in the "symmetric model." Here our goal will be to reproduce the 
result but with better bounds on convergence time, which scale polynomially, rather 
than exponentially, in n. 

For this, we will need to slightly strengthen our connectivity assumptions. It is 
obvious that with Assumption 12.11 no effective bounds on convergence can hold since 
the sequence G{t) may contain arbitrarily many empty graphs. Thus we will replace 
Assumption 12. II with the slightly stronger Assumption 13.21 On the positive side, with 
the stronger Assumption 13. 2[ we can dispense with the assumption that we are in the 
"symmetric model." 

As a convergence measure, we use the "sample variance" of a vector a; G M", 
defined as 

n 

i=l 

where x is the average of the entries of x: 
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We are interested in providing an upper bound on the number of iterations it 
takes for the "sample variance" V{x{t)) to decrease to a small fraction of its initial 
value V{x{0)). The main result of this chapter is the following theorem. 



Theorem 4.1. Let Assumptions \3. 1\ (non-vanishing weights), \3.2\ (B - connnectivity) , 
and \3.4\ (double stochasticity) hold. Then there exists an absolute constant c such 
that we have 

Vit) < eV{0) for all t > c{n^/r])B log(l/e). 



This is the "polynomial time averaging" result alluded to in the title of this 
chapter. Our exposition follows the paper [TB] where this material first appeared. 
Note that this bound is exponentially better than the convergence time bound of 
O (^{l/r])"'^nB\og 1/e) which follow straightforwardly from the arguments of the pre- 
vious chapter. 

We now proceed to the task of proving this theorem. We first establish some 
technical preliminaries that will be key in the subsequent analysis. In particular, 
in the next subsection, we explore several implications of the double stochasticity 
assumption on the weight matrix A{t). 



4.1.1 Preliminaries on doubly stochastic matrices 

We begin by analyzing how the sample variance V{x) changes when the vector 
X is multiplied by a doubly stochastic matrix A. The next lemma shows that 
V{Ax) < V{x). Thus, under Assumptions 13. l l and 13. 4t the sample variance V{x{t)) 
is nonincreasing in t, and V{x{t)) can be used as a Lyapunov function. 

Lemma 4.1. Let A be a doubly stochastic matrix. Then^for all x G M", 

V{Ax) = V{x) - ^Wij{xi - XjY, 

where Wij is the {i,j)-th entry of the matrix A^A. 

Proof. Let 1 denote the vector in with all entries equal to 1. The double stochas- 
ticity of A implies 

Al = 1, l^A = 1^. 

Note that multiplication by a doubly stochastic matrix A preserves the average of the 
entries of a vector, i.e., for any x G M", there holds 

1 T-I 1 

Ax = — 1 Ax = — 1 X = X. 

n n 



^We say c is an absolute constant when it does not depend on any of the parameters in the 
problem, in this case n, _B, 77, e. 

^In the sequel, the notation X]i<j '^i'^l be used to denote the double sum X]j=i T^l=i ■ 
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We now write the quadratic form V{x) — V{Ax) explicitly, as follows: 

V{x)-V{Ax) = {x - xl)'^{x - xl) - {Ax -Axiy{Ax -Axl) 
= (x - xl)^{x - xl) - {Ax - xAl)^{Ax - xAl) 
= {x-xlf{I-A'^A){x-xl). (4.1) 

Let Wij be the (2, j)-th entry of A^A. Note that A'^A is symmetric and stochastic, 
so that Wij = Wji and Wu = 1 — Ylj^i'^ij- Then, it can be verified that 

A^A = I-Y1 ~ ~ ^i)^' (4-2) 

i<j 

where Cj is a unit vector with the i-th entry equal to 1, and all other entries equal to 
(see also |100] where a similar decomposition was used). 
By combining Eqs. (14. ip and (14. 2p . we obtain 

V{x)-V{Ax) = (x-xl)^(^^u'y(ei-ej)(ei-ej)^)(x-xl) 

i<j 

^ ^ W7jj(Xj Xj) . 
i<j 

□ 

Note that the entries Wij{t) of A{t)'^A{t) are nonnegative, because the weight 
matrix A{t) has nonnegative entries. In view of this. Lemma [4.11 implies that 

V{x{t + 1)) < V{x{t)) for all t. 

Moreover, the amount of variance decrease is given by 

V{x{t)) - V{x{t + 1)) = J2w.,{t){x.{t)-x,{t)y. 

i<j 

We will use this result to provide a lower bound on the amount of decrease of the 
sample variance V{x{t)) in between iterations. 

Since every positive entry of A{t) is at least rj, it follows that every positive entry 
of A{t)'^A{t) is at least 77^. Therefore, it is immediate that 

if Wij{t) > 0, then uiij{t) > rf. 

In our next lemma, we establish a stronger lower bound. In particular, we find it 
useful to focus not on an individual Wij, but rather on all Wij associated with edges 
(i, j) that cross a particular cut in the graph {N,£{Al'^ A)). For such groups of Wjj, 
we prove a lower bound which is linear in 77, as seen in the following. 

Lemma 4.2. Let A he a row- stochastic matrix with positive diagonal entries, and 
assume that the smallest positive entry in A is at least rj. Also, let {S~,S^) be a 
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partition of the set N = {1, . . . ,n} into two disjoint sets. If 



then 

j65", j&S+ 

Proof. Let Xlies- j^s+'^ij ^ 0. From the definition of tfie weiglits Wij, we fiave 
Wij = J^k^kiCikj, whicli shows that there exist i G S~ , j G S~^, and some k such that 
ttki > and afcj > 0. For either case where k belongs to S~ or S~^, we see that there 
exists an edge in the set S{A) that crosses the cut {S~,S'^). Let be such an 

edge. Without loss of generality, we assume that i* G and j* G S~^. 
We define 

Cp = aj*i, 
i&s- 

See Figure I4-I( a) for an illustration. Since A is a row-stochastic matrix, we have 

c^ + cp = i, 

implying that at least one of the following is true: 

Case (a): C~, > -, 
2 

Case (b): C+ > ^. 

We consider these two cases separately. In both cases, we focus on a subset of the 
edges and we use the fact that the elements Wij correspond to paths of length 2, with 
one step in S{A) and another in ^(A^). 



Case (a): Cj, > 1/2. 

We focus on those Wij with i ^ S~ and j = j*. Indeed, since all Wij are nonnegative, 
we have 

ies-, j£S+ ies- 
For each element in the sum on the right-hand side, we have 

n 
k=l 
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Figure 4-1: (a) Intuitively, Cp measures how much weight j* assigns to nodes in 
(including itself), and Cjl measures how much weight j* assigns to nodes in S~ . Note that 
the edge {j* is also present, but not shown, (b) For the case where Cj* > 1/2, we only 
focus on two-hop paths between j* and elements i G S~ obtained by taking as the 

first step and the self-edge as the second step, (c) For the case where Cj, > 1/2, we 

only focus on two- hop paths between i* and elements j G obtained by taking as 
the first step in £{A) and as the second step in £{A'^). 

where the inequalities follow from the facts that A has nonnegative entries, its diag- 
onal entries are positive, and its positive entries are at least rj. Consequently, 

Wij* >riY^ ajH = V Cjl. (4.4) 

ies- ies- 

Combining Eqs. (14. 3p and (14. 4p . and recalling the assumption Cj* > 1/2, the result 
follows. An illustration of this argument can be found in Figure H^b). 

Case (h): Cp > 1/2. 

We focus on those Wij with i = i* and j G S*^. We have 

ieS-, jes+ j£S+ 

since all Wij are nonnegative. For each element in the sum on the right-hand side, we 
have 

n 

Wi*j ^ ^ ^ki* ^kj ^ (Xj^i^ — V^j'^j^ 

k=l 

where the inequalities follow because all entries of A are nonnegative, and because 
the choice G S{A) implies that aj*i* > rj. Consequently, 

Combining Eqs. (14. 5 P and (14. 6p . and recalling the assumption Cp > 1/2, the result 
follows. An illustration of this argument can be found in Figure I4-I( c) . □ 
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4.1.2 A bound on convergence time 



With the prehminaries on doubly stochastic matrices in place, we can now proceed to 
derive bounds on the decrease of V{x{t)) in between iterations. We will first somewhat 
relax our connectivity assumptions. In particular, we consider the following relaxation 
of Assumption 13. 2[ 



Assumption 4.1 (Relaxed connectivity). Given an integer t > 0, suppose that the 
components of x{tB) have been reordered so that they are in nonincreasing order. We 
assume that for every d E {I, . . . ,n — 1} , we either have XditB) = Xd+iitB), or there 
exist some time t G {tB, . . . , {t + l)B — 1} and some z G {1, . . . , d}, j G {d+1, . . . ,n} 
such that {i,j) or {j,i) belongs to S{A{t)). 



Lemma 4.3. Assumption \3.S\ implies Assumption \4. ![ with the same value of B. 

Proof. If Assumption 14.11 does not hold, then there must exist an index d [for which 
XditB) 7^ Xd+iitB) holds] such that there are no edges between nodes 1, 2, . . . , c? and 
nodes c? + 1, . . . , n during times t = tB, . . . , (t + 1)B — 1. But this implies that the 
graph 

(a^, S{A{tB))\j£{A{tB + 1))U ■ ■ ■ U S{A{{t + 1)B - l)i 
is disconnected, which violates Assumption 2. □ 



For our convergence time results, we will use the weaker Assumption 14. ![ rather 
than the stronger Assumption 13. 2[ Later on, in Chapter [5l we will exploit the suffi- 
ciency of Assumption |1]T] to design a decentralized algorithm for selecting the weights 
aij{t), which satisfies Assumption 14. 1^ but not Assumption 13.21 

We now proceed to bound the decrease of our Lyapunov function V{x(t)) during 
the interval [tB, {t + l)B — 1]. In what follows, we denote by V{t) the sample variance 
V{x(t)) at time t. 



Lemma 4.4. Let Assumptions \3. 1\ (non-vanishing weights), 3.4\ (double stochasticity) 



and \4.1\ (relaxed connectivity) hold. Let {x{t)} be generated by the update rule 1^3. 1\) . 
Suppose that the components Xi{tB) of the vector x{tB) have been ordered from largest 
to smallest, with ties broken arbitrarily. Then, 



n-l 

V{tB) - V{{t + l)B) > I 5^(x.(t5) - x.^^^{tB)f 

i=l 



Proof. By Lemma [4. 11 we have for all t, 

V{t) - V{t + 1) = J2^,,{t){x.{t) - x,{t)y, (4.7) 



where Wij{t) is the (z, j)-th entry of A(t) A{t). Summing up the variance differences 
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V(t) — V(t + 1) over different values of t, we obtain 

VitB)-V{it + l)B)= Yl J2^nj,,{k)ix.{k)-x,ik)f. (4.8) 

k=tB i<j 

We next introduce some notation. 

(a) For all (i G {1, . . . , n — 1}, let trf be the first time larger than or equal to tB (if 
it exists) at which there is a communication between two nodes belonging to 
the two sets {1, . . . ,d} and {d+1, . . . , n}, to be referred to as a communication 
across the cut d. 

(b) For all t E {tB,...,{t+ 1)B - 1}, let D{t) = {d \ = t}, i.e., D{t) con- 
sists of "cuts" d G {l,...,n — 1} such that time t is the first communica- 
tion time larger than or equal to tB between nodes in the sets {1, . . . ,d} and 
{d + 1, . . . ,n}. Because of Assumption 14. 1^ the union of the sets D{t) includes 
all indices 1, . . . , n — 1, except possibly for indices for which Xd{tB) = Xd+i{tB). 

(c) ForalldG{l,...,r2-l}, letCd = {(z,j), (j, M < c?, d+l<j]. 

(d) For all t G {tB, . . . , (t + 1)5 - 1}, let Fij{t) = {d E D{t) \ (i, j) or (j,i) G C4, 
i.e., Fijit) consists of all cuts d such that the edge (i, j) or (j, i) at time t is the 
first communication across the cut at a time larger than or equal to tB. 

(e) To simplify notation, let yi = XiitB). By assumption, we have yi> ■ ■ ■ > Un- 
We make two observations, as follows: 

(1) Suppose that d G D{t). Then, for some (z, j) G C^, we have either aij{t) > or 
aji{t) > 0. Because A(t) is nonnegative with positive diagonal entries, we have 

n 

'^iji't) = Y^kiakj > aii(t)aij(t) + aji{t)a,jj{t) > 0, 
fc=i 

and by Lemma [4.21 we obtain 

E ^^.w>i- (4-9) 

(2) Fix some with i < j, and time t' G {tB, . . . ,{t + 1)B — 1}, and suppose 
that Fij{t') is nonempty. Let Fij{t') = {di, . . . , d^}, where the dj are arranged 
in increasing order. Since di G Fij{t'), we have di G D(t) and therefore td^ = t'. 
By the definition of t^^, this implies that there has been no communication 
between a node in {1, . . . ,di} and a node in {di + 1, . . . , n} during the time 
interval [tB,t' — 1]. It follows that Xj(t') > ya^- By a symmetrical argument, 
we also have 

Xj{t') < yd,+i. (4.10) 
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These relations imply that 

Since the components of y are sorted in nonincreasing order, we have yd — yd+i > 
0, for every d G Fij{t'). For any nonnegative numbers Zi, we have 

{zi + --- + Zkf >zl + --- + zl 

which implies that 

ix,{t') - x,{t')Y > J2 (yd-yd+i?- (4.11) 



We now use these two observations to provide a lower bound on the expression on 
the right-hand side of Eq. fH??]) at time t'. We use Eq. (14.111) and then Eq. fjT9l) . to 
obtain 

i<j i<j d£Fij{t') 

= XI XI Wij{t'){yd~yd+if 

d&D{t') {i,j)&Cd 

> I X ivd-yd+if- 

d&D{t') 

We now sum both sides of the above inequality for different values of t, and use Eq. 
( I4l8|) . to obtain 

(t+l)B-l 

V{tB)-V{{t + l)B) = Yl 

k=tB i<j 
{t+l)B-l 

- 1 X X iVd-Vd+if 

k=tB d&D{k) 



n-1 

2 



(yd - Vd+l] 



d=l 



where the last inequality follows from the fact that the union of the sets D{k) is only 
missing those d for which yd = yd+i- CH 



We next establish a bound on the variance decrease that plays a key role in our 
convergence analysis. 
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Lemma 4.5. Let Assumptions \3. 1\ (non-vanishing weights), \3.4\ (double stochastic- 
ity), and 4-1 (connectivity relaxation) hold, and suppose that V{tB) > 0. Then, 



VitB) - 2n2 ^"""^ 

Proof. Without loss of generality, we assume that the components of x(tB) have been 
sorted in nonincreasing order. By Lemma 14.41 we have 

n-1 

V{tB) - Viit + 1)B) > ^ Y.^x,{tB) - 
This implies that 



2 

i=l 



VitB) - V{{t + l)B) ^ Y:^Zl{^^{tB) - x,+,{tB)f 



VitB) - 2 Y.tii^i{kB) - x{kB))^ • 

Observe that the right-hand side does not change when we add a constant to every 
Xi{tB). We can therefore assume, without loss of generality, that x(tB) = 0, so that 

V{tB) - V{{t + 1)B) ^ ri Er=7(^. - 



V{tB) -2^i^.|-^^^n J2l^xj 

Note that the right-hand side is unchanged if we multiply each Xi by the same con- 
stant. Therefore, we can assume, without loss of generality, that Yll=i^'i = 1, so 
that 

V{tB)-V{{t + l)B) ^ Vrx -x-.)^ f412) 

The requirement Yli^l ~ ^ implies that the average value of is 1/n, which implies 
that there exists some j such that \xj\ > 1/y/n. Without loss of generality, let us 
suppose that this Xj is positivelf] 

The rest of the proof relies on a technique from [66] to provide a lower bound on 
the right-hand side of Eq. (14. 12p . Let 

Zi = Xi — Xi+i for i < n, and Zn = 0. 

Note that Zi >0 for all i and 

n 

^ ^ ■2-2 Xi Xn. 
i=l 

Since xj > l/y/n for some j, we have that Xi > l/\/n; since Yl'^=i^i = 0, it follows 



■^Otherwise, we can replace x with —a; and subsequently reorder to maintain the property that 
the components of x arc in descending order. It can be seen that these operations do not affect the 
objective value. 
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that at least one Xj is negative, and therefore x„ < 0. This gives us 

^ 1 



1=1 * 

Combining with Eq. f l4.12p . we obtain 

VitB)-Viit + l)B) ^7] . ^2 

^ — > - mm > zf. 

V{tB) - 2.,>o, E.^~.>i/v^'^ 

The minimization problem on the right-hand side is a symmetric convex optimization 
problem, and therefore has a symmetric optimal solution, namely Zi = for all 

i. This results in an optimal value of l/n^. Therefore, 

V{tB)-V{{t + l)B) ^ V 



V{kB) - 2n2' 

which is the desired result. □ 



We are now ready for our main result, which establishes that the convergence time 
of the sequence of vectors x{k) generated by Eq. (13. ip is of order 0{n^B /rf). 

Theorem 4.2. Let Assumptions \3 . 1\ (non-vanishing weights), 3.4\ (double stochastic- 
ity), and 4jJ_ (connectivity relaxation) hold. Then there exists an absolute constant c 
such that we have 



V{t) < eV{0) for all t > c{n^/7])B log(l/e) 
Proof. The result follows immediately from Lemma | 



□ 



Recall that, according to Lemma [4. 3[ Assumption 13.21 implies Assumption 14.11 In 
view of this, the convergence time bound of Theorem 14.21 holds for any n and any 
sequence of weights satisfying Assumptions 13.11 (non- vanishing weights), 13.41 (double 
stochasticity), and 13.21 (-B-connectivity) . This proves Theorem 14. ll from the beginning 
of this chapter. 



4.2 Concluding remarks 

In this chapter, we have presented a polynomial convergence-time bound on the per- 
formance of a class of averaging algorithms. Several open research directions naturally 
present themselves. 

First, is it possible to design faster algorithms which nevertheless compute av- 
erages correctly on arbitrary (time-varying, undirected) graph sequences? We make 
some headway on this question in Chapter [5], where we design an algorithm whose 
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convergence time scales as 0{n?) with n, and in Chapter |6] we prove a lower bound 
of at least r2(n^) for a somewhat restricted class of algorithms. However, the general 
question of how fast averaging algorithms can scale with n is still open. 

Secondly, where is the dividing line before polynomial and exponential convergence 
time? In particular, how far may we relax the double stochasticity Assumption 13.41 
while still having polynomial convergence time? For example, does polynomial time 
convergence still hold if we replace Assumption 13.41 with the requirement that the 
matrices A{t) be (row) stochastic and each column sum is in [1 — e, 1 + e] for some 
small e > 0? 
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Chapter 5 

Averaging in quadratic time 



In the previous section, we have shown that a large class of averaging algorithms have 
0{B{n'^ /rj) logl/e) convergence time. 

In this section, we consider decentralized ways of synthesizing the weights aij{t) 
while satisfying Assumptions 13.11 13. 4[ and 14. 1[ We assume that the sequence of 
(undirected) graphs G(t) = {N, E{t)) is given exogenously, but the nodes can pick the 
coefficients aij{t). Our focus is on improving convergence time bounds by constructing 
"good" schemes. 

Naturally, several ways to pick the coefficients present themselves. For example, 
each node may assign 



where deg(z) is the degree of i in G{t). If e is small enough and the graph G{t) is 
undirected [i.e., (i, j) e E(t) if and only if (j, i) G E{t)], this results in a nonnegative, 
doubly stochastic matrix (see [S]). The Metropolis algorithm from Chapter [2] is a 
special case of this method. However, if a node has Q{n) neighbors, rj will be of order 
0(l/n), resulting in Q{n^) convergence time. Moreover, this argument applies to all 
protocols in which nodes assign equal weights to all their neighbors; see and [TB] 
for more examples. 

In this section, we examine whether it is possible to synthesize the weights ajj(t) 
in a decentralized manner, so that the above convergence time is reduced. Our main 
result shaves a factor of n off the convergence time of the previous paragraph. 

Theorem 5.1. Suppose G{t) = {N,E{t)) is a sequence of undirected graphs such that 
{N, E{tB) U E{tB + 1) U ■ ■ ■ U E{{t + l)B - 1)) is connected, for all integers t. Then, 
there exists a decentralized way to pick the coefficients aij{t) such that 



This theorem has appeared in the paper [75], which we will follow here. 
Our approach we will be to pick aij{t) so that aij{t) > rj whenever aij{t) ^ 0, 
where is a positive constant independent of n and B. We show that this is indeed 



aij{t) 
aait) 



1 - e ■ deg(z). 



if (j,i) e E{t) and i^], 



V{t) < eV{Q) for all t > cn^B log(l/e). 
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possible, under the additional assumption that the graphs G{t) are undirected (in 
Chapter [31 we referred to this the "symmetric model."). Our algorithm is 

data-dependent, in that aij{t) depends not only on the graph G{t), but also on the 
data vector x{t). Furthermore, it is a decentralized 3-hop algorithm, in that aij{t) 
depends only on the data at nodes within a distance of at most 3 from i. Our algorithm 
is such that the resulting sequences of vectors x(t) and graphs G(t) = {N, S{t)), with 
S(t) = I aij{t) > 0}, satisfy Assumptions 13. 1113.41 and [ 4.11 Thus, a convergence 

time result can be obtained from Theorem 14.21 

5.1 The algorithm 

The algorithm we present here is a variation of an old load balancing algorithm (see 
[35] and Chapter 7.3 of [13)0 

At each step of the algorithm, each node offers some of its value to its neighbors, 
and accepts or rejects such offers from its neighbors. Once an offer from i to j, of size 
S > 0, has been accepted, the updates Xi Xi — S and Xj ^ Xj + S are executed. 

We next describe the formal steps the nodes execute at each time t. For clarity, 
we refer to the node executing the steps below as node C. Moreover, the instructions 
below sometimes refer to the neighbors of node C; this always means current neighbors 
at time t, when the step is being executed, as determined by the current graph G{t). 
We assume that at each time t, all nodes execute these steps in the order described 
below, while the graph remains unchanged. 

Balancing Algorithm: 

1. Node G broadcasts its current value xc to all its neighbors. 

2. Going through the values it just received from its neighbors. Node G finds the 
smallest value that is less than its own. Let D he a. neighbor with this value. 
Node G makes an offer of {xc — xd)/^ to node D. 

If no neighbor of G has a value smaller than xc, node G does nothing at this 
stage. 

3. Node G goes through the incoming offers. It sends an acceptance to the sender 
of a largest offer, and a rejection to all the other senders. It updates the value 
of Xc by adding the value of the accepted offer. 

If node G did not receive any offers, it does nothing at this stage. 

4. If an acceptance arrives for the offer made by node G, node G updates xc by 
subtracting the value of the offer. 

Note that the new value of each node is a linear combination of the values of its 
neighbors. Furthermore, the weights aij{t) are completely determined by the data 
and the graph at most 3 hops from node i in G{t). This is true because in the course 

^This algorithm was also considered in [85], but in the absence of a result such as Theorem 14.21 
a weaker convergence time bound was derived. 
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of execution of the above steps, each node makes at most three transmissions to its 
neighbors, so the new value of node C cannot depend on information more than 3 
hops away from C. 

5.2 Performance analysis 

The following theorem (stated at the beginning of the chapter as Theorem 15. ip allows 
us to remove a factor of n from the worst-case convergence time bounds of Theorem 

m 

Theorem 5.2. Consider the balancing algorithm, and suppose that G{t) = {N,E(t)) 
is a sequence of undirected graphs such that (TV, E(tB)UE(tB+l)U- ■ ■UE{(t+l)B—l)) 
is connected, for all integers t. There exists an absolute constant c such that we have 

V{t) < eV{0) for allt > cn^E log(l/e). 

Proof. Note that with this algorithm, the new value at some node z is a convex 
combination of the previous values of itself and its neighbors. Furthermore, the 
algorithm keeps the sum of the nodes' values constant, because every accepted offer 
involves an increase at the receiving node equal to the decrease at the offering node. 
These two properties imply that the algorithm can be written in the form 

x(t + 1) = A{t)x{t), 

where A{t) is a doubly stochastic matrix, determined by G{t) and x{t). It can be seen 
that the diagonal entries of A{t) are positive and, furthermore, all nonzero entries of 
A(t) are larger than or equal to 1/3; thus, rj = 1/3. 

We claim that the algorithm [in particular, the sequence S{A{t))] satisfies As- 
sumption UTJ Indeed, suppose that at time tB, the nodes are reordered so that the 
values Xi{tB) are nonincreasing in i. Fix some d E {1, . . . , n — 1}, and suppose that 
XditB) 7^ Xd+iitB). Let S""*" = {1, . . . , c?} and S'~ = {c? + 1, . . . , n}. 

Because of our assumptions on the graphs G(t), there will be a first time t' in the 
interval {tB, . . . , {t+l)B — l}, at which there is an edge in E{t) between some i* G 
and j* G S~ . Note that between times tB and t', the two sets of nodes, and S~ , 
do not interact, which implies that Xj(t') > Xd{tB), for i G S~^, and Xj(t') < Xd{tB), 
for j E S~ . 

At time t, node i* sends an offer to a neighbor with the smallest value; let us 
denote that neighbor by k*. Since {i*,j*) G E(t'), we have Xk*{t') < Xj*{t') < Xd(tB), 
which implies that k* E S~ . Node k* will accept the largest offer it receives, which 
must come from a node with a value no smaller than Xi*{t'), and therefore no smaller 
than Xd{tB)] hence the latter node belongs to S~^. It follows that S{A{t')) contains 
an edge between k* and some node in S~^, showing that Assumption 14. II is satisfied. 

The claimed result follows from Theorem 14. 2[ because we have shown that all of 
the assumptions in that theorem are satisfied with 77 = 1/3. □ 
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5.3 Concluding remarks 

In this chapter, we have presented a specific averaging algorithm whose scahng with 
the number of nodes n is O(n^). An interesting direction to explore is to what extent, 
if any, this can be reduced. The next chapter provides a partial negative answer to 
this question, showing that one cannot improve on quadratic scaling with a limited 
class of algorithms. 



52 



Chapter 6 



On the optimahty of quadratic 
time 



The goal of this chapter is to analyze the fundamental limitations of the kind of 
distributed averaging algorithms we have been studying. The previous chapter de- 
scribed a class of averaging algorithms whose convergence time scales with the number 
of agents n as O(n^). Our aim in this chapter is to show that this is the best scal- 
ing for a common class of such algorithms; namely, that any distributed averaging 
algorithm that uses a single scalar state variable at each agent and satisfies a nat- 
ural "smoothness" condition will have this property. Our exposition will follow the 
preprint [79] where these results have previously appeared. 

We next proceed to define the class of distributed averaging algorithms we are 
considering and informally state our result. 

6.1 Background and basic definitions. 

Definition of local averaging algorithms: Agents 1, . . . ,n begin with real num- 
bers Xi(0), . . . , x„(0) stored in memory. At each round i: = 0,l,2,..., agent i broad- 
casts Xiit) to each of its neighbors in some undirected graph G{t) = ({1, . . . , n}, E(t)), 
and then sets Xj(t + 1) to be some function of Xi{t) and of the values Xj/(t), . . . 

it has just received from its own neighbors: 



We require each fi^G{t) to be a differentiable function. Each agent uses the incoming 
messages Xi'(t),Xii'(t), ... as the arguments of /i,G(t) in some arbitrary order; we as- 
sume that this order does not change, i.e. if G{ti) = G{t2), then the message coming 
from the same neighbor of agent i is mapped to the same argument of /i,G(t) for t = ti 
and t = t2- It is desired that 



Xi{t + 1) = fi,G(t){Xi{t),Xi'{t),Xi>'{t), ...). 



(6.1) 




(6.2) 
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for every i, for every sequence of graphs G{t) having the property that 

the graph ({1, . . . , n}, Us>tE{s)) is connected for every t, (6.3) 
and for every possible way for the agents to map incoming messages to arguments of 

fi,Git)- 

In words, as the number of rounds t approaches infinity, iteration (16. ip must 
converge to the average of the numbers xi(0), . . . , Xn{0). Note that the agents have no 
control over the communication graph sequence G{t), which is exogenously provided 
by "nature." However, as we stated previously, every element of the sequence G{t) 
must be undirected: this corresponds to symmetric models of communication between 
agents. Moreover, the sequence G{t) must satisfy the mild connectivity condition of 
Eq. (16. 3p . which says that the network cannot become disconnected after a finite 
period. 

Local averaging algorithms are useful tools for information fusion due to their 
efficient utilization of resources (each agent stores only a single number in memory) 
as well as their robustness properties (the sequence of graphs G{t) is time-varying, 
and it only needs to satisfy the relatively weak connectivity condition in Eq. (16. Sp 
for the convergence in Eq. (16. 2p to hold) . As explained in Chapter |2l no other class 
of schemes for averaging (e.g., flooding, fusion along a spanning tree) is known to 
produce similar results under the same assumptions. 

Remark: As can be seen from the subscripts, the update function fi^G{t) is allowed to 
depend on the agent and on the graph. Some dependence on the graph is unavoidable 
since in different graphs an agent may have a different number of neighbors, in which 
case nodes will receive a different number of messages, so that even the number of 
arguments of /j,G{t) will depend on G{t). It is often practically desired that /i,G(t) 
depend only weakly on the graph, as the entire graph may be unknown to agent i. 
For example, we might require that /j,G{t) be completely determined by the degree of 
i in G{t). However, since our focus is on what distributed algorithms cannot do, it 
does not hurt to assume the agents have unrealistically rich information; thus we will 
not assume any restrictions on how fi^G(t) depends on G{t). 

Remark: We require the functions fi^G{t) to be smooth, for the following reason. 
First, we need to exclude unnatural algorithms that encode vector information in 
the infinitely many bits of a single real number. Second, although we make the 
convenient technical assumption that agents can transmit and store real numbers, we 
must be aware that in practice agents will transmit and store a quantized version of 
Xi(t). Thus, we are mostly interested in algorithms that are not disrupted much by 
quantization. For this reason, we must prohibit the agents from using discontinuous 
update functions j\G{t)- For technical reasons, we actually go a little further, and 
prohibit the agents from using non-smooth update functions fi^G(t)- 
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6.1.1 Examples. 

In order to provide some context, let us mention just a few of the distributed averaging 
schemes that have been proposed in the hterature: 

1. The max-degree method [H] involves picking e(^) with the property e{t) < 
l/{d(t) + 1), where d(t) is the largest degree of any agent in G{t), and updating 
by 

Xi{t + 1) = Xi{t) + e{t) i^At) - ^iit)) ■ 

i&Ni{t) 

Here we use Ni{t) to denote the set of neighbors of agent i in G{t). In practice, 
a satisfactory e(t) may not be known to all of the agents, because this requires 
some global information. However, in some cases a satisfactory choice for e(t) 
may be available, for example when an a priori upper bound on d{G{t)) is 
known. 

2. The Metropolis method (see Chapter [2]) from [97] involves setting eij{t) to satisfy 
eij{t) < m.m{l/{di(t)),l/{dj(t))), where di(t),dj(t) are the degrees of agents i 
and j in G(t), and updating by 

Xi{t + 1) = Xi{t) + eij{t) {xj{t) - Xi{t)) . 

3. The load-balancing algorithm of Chapter |5] involves updating by 

Xi{t + I) = Xi{t) + Y aij{t) {Xj{t) - Xi{t)) , 

where aij{t) is determined by the following rule: each agent selects exactly 
two neighbors, the neighbor with the largest value above its own and with 
the smallest value below its own. If i,j have both selected each other, then 
aij{t) = 1/3; else aij{t) = 0. The intuition comes from load-balancing: agents 
think of Xi{t) as load to be equalized among their neighbors; they try to offload 
on their lightest neighbor and take from their heaviest neighbor. 

We remark that the above load-balancing algorithm is not a "local averaging 
algorithm" according to our definition because Xi{t + 1) does not depend only on 
Xi{t) and its neighbors; for example, agents i and j may not match up because j 
has a neighbor k with Xk{t) > Xj{t). By contrast, the max-degree and Metropolis 
algorithm are indeed "local averaging algorithms." 

6.1.2 Our results 

Our goal is to study the worst-case convergence time over all graph sequences. This 
convergence time may be arbitrarily bad since one can insert arbitrarily many empty 
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graphs into the sequence G{t) without violating Eq. (16. 3p . To avoid this trivial situ- 
ation, we require that there exist some integer B such that the graphs 



({l,...,n},u: 



ik+l)B 



(6.4) 



i=kB 



are connected for every integer k. 

Let x{t) be the vector in S?" whose ith component is Xi{t). We define the conver- 
gence time T{n, e) of a local averaging algorithm as the atime until "sample variance" 



apermanently shrinks by a factor of e, i.e., V{x(t)) < eV{x{0)) for all t > T{n, e), for 
all possible n-node graph sequences satisfying Eq. (16. 4p . and all initial vectors x(0) 
for which not all Xj(0) are equal; T{n,e) is defined to be the smallest number with 
this property. We are interested in how T{n, e) scales with n and e. 

Currently, the best available upper bound for the convergence time is obtained 
with the load-balancing algorithm of of Chapter El where it was proven 



which speed up the convergence time? 

Our main result is that the answer to this question is "no" within the class of 
local averaging algorithms. For such algorithms we prove a general lower bound of 
the form 



for some absolute constant c. Moreover, this lower bound holds even if we assume 
that the graph sequence G(t) is the same for all t; in fact, we prove it for the case 
where G(t) is a fixed "line graph." 

6.2 Formal statement and proof of main result 

We next state our main theorem. The theorem begins by specializing our definition 
of local averaging algorithm to the case of a fixed line graph, and states a lower bound 
on the convergence time in this setting. 

We will use the notation 1 to denote the vector in M."' whose entries are all ones, 
and to denote the vector whose entries are all 0. The average of the initial values 
xi{0), . . . , Xn{0) will be denoted by x. 



By "absolute constant" we mean that C does not depend on the problem parameters n, B, e. 





T{n, e) > cn^B log - 
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Theorem 6.1. Let fi, /„ be two differentiahle functions from M? to M, and let 
f2, fs, ■ ■ ■ , fn~i be differentiahle functions from to M. Consider the dynamical 
system 



Xi{t+1) 
Xi{t+1) 
Xn{t + l) 



fl{xi{t),X2{t)), 

fi{xi{t),Xi_i{t),Xi+i{t)), i 

fn{Xn—l 



2,... 



n 



- 1 



(6.5) 



Suppose that there exists a function T{n, e) such that 



x{t) — xl\\2 



x(0) — Xl\\2 



for all n and e, all t > r{n,e), and all initial conditions a;i(0), . . . ,a;„(0) for which 
not all Xj(0) are equal. Then, 



for all e > and n > 3. 

Remark: The dynamical system described in the theorem statement is simply what 
a local averaging algorithm looks like on a line graph. The functions /i,/n are the 
update functions at the left and right endpoints of the line (which have only a single 
neighbor), while the update functions /2, fs, ■ ■ ■ , fn-i are the ones used by the middle 
agents (which have two neighbors). As a corollary, the convergence time of any local 
averaging algorithm must satisfy the lower bound T(n, e) > (l/30)n^log(l/e). 

Remark: Fix some tt, > 3. A corollary of our theorem is that there are no "local 
averaging algorithms" which compute the average in finite time. More precisely, there 
is no local averaging algorithm which, starting from initial conditions x(0) in some 
ball around the origin, always results in x{t) = xl for all times t larger than some T. 
We will sketch a proof of this after proving Theorem 1. By contrast, the existence of 
such algorithms in slightly different models of agent interactions was demonstrated 
in [SI] and [SH]. 



We first briefly sketch the proof strategy. We will begin by pointing out that must 
be an equilibrium of Eq. (16. 5p : then, we will argue that an upper bound on the 
convergence time of Eq. (16. 5 p would imply a similar convergence time bound on the 
linearization of Eq. (16. 5 p around the equilibrium of 0. This will allow us to apply 
a previous f2(n^) convergence time lower bound for linear schemes, proved by the 
authors in [85] . 

Let / (without a subscript) be the mapping from to itself that maps x{t) to 
x(t + 1) according to Eq. (16. 5p . 




1 



e 



(6.6) 



6.2.1 Proof. 
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Lemma 6.1. f{al) = al, for any a G M. 



Proof. Suppose that x(0) = al. Then, the initial average is a, so that 

al = Ymvxit) = limx(t + 1) = hm fixit)). 

t t t 

We use the continuity of / to get 

al = /(hmx(t)) = /(al). 



For i,j = 1, 



,n, we define aij = and the matrix 



A = f'{0) 



( Oil ^12 

a21 a22 0-23 

a32 a33 a34 



\ 






V 











Lemma 6.2. For any integer k > 1, 



x-s-O f||2 



dwhere /'^ refers to the k-fold composition of f with itself. 



□ 



Proof. The fact that /(O) = imphes by the chain rule that the derivative of /'^ at 
X = is A''. The above equation is a restatement of this fact. □ 

Lemma 6.3. Suppose that x'^ 1 = 0. Then, 



lim A'^x = 0. 

m— >oo 



Proof. Let S be a ball around the origin such that for all x ^ B, with x 7^ 0, we have 

M!W^^<i, for»: = r(n.l/2). 

Such a ball can be found due to Lemma 16. 2[ Since we can scale x without affecting 
the assumptions or conclusions of the lemma we are trying to prove, we can assume 
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that X E B. It follows that that for k = T{n, 1/2), we have 

\\A''x\\2 \\A^x- f{x) + f{x)\ 



FII2 IFII2 

I2 



< T + 



1 , Wf'ix) 



4 ||a;||2 
1 1 

^ 4 + 2 

3 

< -. 

- 4 

Since this inequality implies that A^x G B, we can apply the same argument recur- 
sively to get 

lim (A^y^x = 0, 

which implies the conclusion of the lemma. □ 



Lemma 6.4. Al = 1. 

Proof. We have 

h^O h h^O h 

where we used Lemma [6.11 □ 
Lemma 6.5. For every vector x G M", 

lim A^x = xl, 

k—^oo 

where x = {Yl^=i^i)/^- 

Proof. Every vector x can be written as 

X = xl + y, 

where y'^1 = 0. Thus, 

lim A'^x = hm A'^ {xl + y) = xl + lim A'^y = xl, 

k—^oo fc— >oo k—>oo 

where we used Lemmas 16.31 and □ 

Lemma 6.6. The matrix A has the following properties: 

1. aij = whenever \i — j\ > 1- 

2. The graph G = ({1, . . . , n}, E), with E = \ aij 7^ 0}, is strongly connected. 

59 



3. Al = 1 and 1^ A = 1. 



4- An eigenvalue of A of largest modulus has modulus 1. 
Proof. 1. True because of the definitions of / and A. 

2. Suppose not. Then, there is a nonempty set 5 C {1, . . . , ra} with the property 
that Qij = whenever i ^ S and j G S^. Consider the vector x with Xj = 
for i e 5", and Xj = 1 for j G S''. Clearly, (1/n) Xli > 0; but {A''x)i = for 
i E S. This contradicts Lemma [6.51 

3. The first equality was already proven in Lemma I6.4[ For the second, let b = 
l^A. Consider the vector 

z = lim A^Ci, (6.7) 

k—>oo 



where Ci is the ith unit vector. By Lemma [6.51 

l^e. 1 
z = 1 = — 1. 



n n 

On the other hand, 

lim A^Ci = lim A^'+^Cj = lim A'' {Ad). 

fc— >oo fc— >oo fc— >oo 

Applying Lemma 1^31 again, we get 

n n 

where hi is the ith component of h. We conclude that 6j = 1; since no assumption 
was made on this implies that & = 1, which is what we needed to show. 

4. We already know that Al = 1, so that an eigenvalue with modulus 1 exists. Now 
suppose there is an eigenvalue with larger modulus, that is, there is some vector 
a; G such that Ax = \x and |A| > 1. Then lim^ 1174*^x112 = oo. By writing 

X = Xrcal + «a;imaginary, We immediately have that A^X = A^X^cal + iA^'Ximaginary 

But by Lemma [675] both A^Xrcai and A'^x imaginary approach some finite multiple 
of 1 as /c — oo, so ||A'^x||2 is bounded above. This is a contradiction. 

□ 

Theorem 6.2 (Eigenvalue lemma). If A satisfies all of the conclusions of Lemma \6.6\, 

then A has an eigenvector v, with real eigenvalue A G (1 — 1), such that = 0. 

This proof of this fact needs its own section. 



6.3 Proof of the eigenvalue lemma 

Lemma 6.7. Consider an n x n matrix A and let Ai, A2, . . . , A„, be its eigenvalues, 
sorted in order of decreasing maginitude. Suppose that the following conditions hold. 
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(a) We have Ai = 1 and Al = 1. 

(b) There exists a positive vector tt such that tt'^A = n'^ . 

(c) For every i and j , we have Tiittij = T^jdji- 



Let 



n n 

S = ja; I ^TTjXi = 0, ^vTjX- = l| 



i=l i=l 

Then, all eigenvalues of A are real, and 



J n n 

A2 = 1 - 2 J2 Yl - ^i)^- (6-8) 

i=i j=i 

In particular, for any vector y that satisfies J27=i ~ ^' have 

n n 

^^TT.aijiyi -yjf 
A2 > 1 - ^^^^^n ■ (6.9) 

i=l 

Proof. Let D be a diagonal matrix whose ith diagonal entry is ttj. Condition (c) 
yields DA = A^D. We define the inner product (■, ■).,^ by (x, y)^ = x^Dy. We then 
have 

(x, Ay)^ = x^DAy = A^ Dy = {Ax, y)^. 

Therefore, A is self-adjoint with respect to this inner product, which proves that A 
has real eigenvalues. 

Since the largest eigenvalue is 1, with an eigenvector of 1, we use the variational 
characterization of the eigenvalues of a self-adjoint matrix (Chapter 7, Theorem 4.3 
of [90]) to obtain 

A2 = max(x, Ax)t, 

x&S 

n n 

= max TTj ttijXi Xj 

^^■^ i=l j=l 

= I max V V TliQijix^ + x'^j - (x, - XjY). 

i=l j=l 

For X G 5", we have 



^ ^ iTiaij{x'^i + = 2 ^ ^ vTiajjX- = 2 ^ vr^x- = 2(x, x)^ =2, 

i=l j=l i=l j=l 1=1 
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which yields 

^ n n 

A2 = 1 - - min ^ ^ 'Kiaij{xi - Xjf. 
i=\ j=i 

Finally, Eq. (16. 9 p follows from (16. 8 p by considering the vector Xi = yi/ ^ (X]j=i '^jV])- 

□ 

With the previous lemma in place, we can now prove Theorem 16. 2[ 

Proof of Theorem \6.^ If the entries of A were all nonnegative, we would be dealing 
with a birth-death Markov chain. Such a chain is reversible, i.e., satisfies the detailed 
balance equations Tiittij = i^jdji (condition (c) in Theorem 16. 7p . In fact the derivation 
of the detailed balance equations does not make use of nonnegativity; thus, detailed 
balance holds in our case as well. Since vrj = 1 by assumption, we have that our 
matrix A is symmetric. 

For i = 1, . . . , n, let i/j = 2 — (n + l)/2; observe that XliLi Vi ~ ^- make 
use of the inequality (16. 9p . Since aij = whenever |« — j| > 1, we have 

n n n n 

^ TTiaijiyi - Vjf aij = n. (6.10) 

i=l j=l i=l j=l 

Furthermore, 

n n ^ 3 

E-^^NE(^-^)'>?^- (6-11) 

1=1 i=l 

The last inequality follows from the well known fact var(X) = (n^ — 1)/12 for a 
discrete uniform random variable X. Using the inequality (16. 9p and Eqs. (16.101) - 
(16. lip , we obtain the desired bound on p. □ 



Remark: Note that if the matrix A is as in the previous theorem, it is possible for 
the iteration x{t + l) = Ax{t) not to converge at all. Indeed, nothing in the argument 
precludes the possibility that the smallest eigenvalue is —1, for example. In such 
a case, the lower bounds of the theorem — derived based on bounding the second 
largest eigenvalue — still hold as the convergence rate and time are infinite. 

Remark: We could have saved ourselves a few lines by appealing to the results of 
[18] once we showed A is symemtric. 



6.4 Proof of the main theorem 

We are now in a position to finally prove the main result of this chapter. 

Proof of Theorem 16.11 Let v be an eigenvector of A with the properties in part 

5 of Lemma 16.61 Fix a positive integer k. Let e > and pick x 7^ to be a small 
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enough multiple of v so that 



\\f{x)_-AN^ 
This is possible by Lemma [6.21 Then, we have 

Using the orthogonality property = 0, we have x = 0. Since we placed no 
restriction on e, this implies that 

Plugging k = T{n, e) into this equation, we see that 

■n?) 

Since n > 3, we have 1 — G (0, 1), and 

-rin.e) > - — ^ loge. 
log(l — b/n'^j 

Now using the bound log(l — a) > 5{a — 1) for a G [0, 2/3), we get 

1 

r(n,e)>-log-. 

□ 

Remark: We now sketch the proof of the claim we made earlier that a local averaging 
algorithm cannot average in finite time. Fix n > 3. Suppose that for any x(0) in 
some ball B around the origin, a local averaging algorithm results in x{t) = xl for all 
t > T. 

The proof of Theorem 1 shows that given any k,e > 0, one can pick a vector v{e) 
so that if x(0) = v{e) then V{x{k))/V{x{0)) > (1 — — e. Moreover, the vectors 

v{e) can be chosen to be arbitrarily small. One simply picks k = T and e < 
to get that x{T) is not a multiple of 1; and furthermore, picking v{e) small enough 
in norm to be in B results in a contradiction. 

Remark: Theorem 16 . 1 1 gives a lower bound on how long we must wait for the 2-norm 
\\x{t) — XIII2 to shrink by a factor of e. What if we replace the 2-norm with other 
norms, for example with the 00-norm? Since Boo{0,r / y/n) C B2{0,r) C Boo{0,r), it 
follows that if the 00-norm shrinks by a factor of e, then the 2-norm must shrink by 
at least ^/ne. Since e only enters the lower bound of Theorem 16.11 logarithmically. 
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the answer only changes by a factor of logn in passing to the oo-norm. A similar 
argument shows that, modulo some logarithmic factors, it makes no difference which 
p-norm is used. 

6.5 Concluding remarks 

We have proved a lower bound on the convergence time of local averaging algorithms 
which scales quadratically in the number of agents. This lower bound holds even if 
all the communication graphs are equal to a fixed line graph. Our work points to a 
number of open questions. 

1. Is it possible to loosen the definition of local averaging algorithms to encompass 
a wider class of algorithms? In particular, is it possible to weaken the require- 
ment that each fi^G(t) be smooth, perhaps only to the requirement that it be 
piecewise-smooth or continuous, and still obtain a r2(n^) lower bound? 

2. Does the worst-case convergence time change if we introduce some memory and 
allow Xi{t + 1) to depend on the last k sets of messages received by agent i? 
Alternatively, there is the broader question of how much is there to be gained 
if every agent is allowed to keep track of extra variables. Some positive results 
in this direction were obtained in |53] . 

3. What if each node maintains a small number of update functions, and is allowed 
to choose which of them to apply? Our lower bound does not apply to such 
schemes, so it is an open question whether its possible to design practical algo- 
rithms along these lines with worst-case convergence time scaling better than 
n\ 

In general, it would be nice to understand the relationship between the structure 
of classes of averaging algorithms (e.g., how much memory they use, whether the 
updates are linear) and the best achievable performance. 
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Chapter 7 



Quantized averaging 



In this chapter, we consider a quantized version of the update rule in Eq. (13. ip . This 
model is a good approximation for a network of nodes communicating through finite 
bandwidth channels, so that at each time instant, only a finite number of bits can be 
transmitted. We incorporate this constraint in our algorithm by assuming that each 
node, upon receiving the values of its neighbors, computes the convex combination 
'^^=iO'ij{k)xj{k) and quantizes it. This update rule also captures a constraint that 
each node can only store quantized values. 

Unfortunately, under Assumptions 13. 13. 2[ and 13. 4[ if the output of Eq. (13. ip 
is rounded to the nearest integer, the sequence x{k) is not guaranteed to converge 
to consensus; see [M] for an example. We therefore choose a quantization rule that 
rounds the values down, according to 



where [-J represents rounding down to the nearest multiple of 1/Q, and where Q is 
some positive integer. 

We adopt the natural assumption that the initial values are already quantized. 

Assumption 7.1. For all i, Xi{0) is a multiple of 1/Q. 

We will next demonstrate that starting from multiples oi 1/Q, arbitrarily accurate 
quantized computation of the average is possible provided the number of bits used to 
quantize is at least on the order of log n. Moreover, the time scaling with n required 
to do this is still on the order of n^, as in the previous chapters. Thus the above Eq. 
( 17. ip . despite its simplicity, turns out to have excellent performance in the quantized 
setting. 

Our exposition in this chapter will follow the paper [76], where the results de- 
scribed here have previously appeared. 
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7.1 A quantization level dependent bound 



For convenience we define 

U = maxxj(0), L = minXi(O). 

i i 

We use K to denote the total number of relevant quantization levels, i.e., 

K={U-L)Q, 

which is an integer by Assumption 17. 1[ 

We first present a convergence time bound that depends on the quantization level 

Q. 

Proposition 7.1. Let Assumptions \3 . 1\ (non-vanishing weights), \3.4\ (double stochas- 
ticity), \3.2\ (B -connectivity), and \7.1\ (quantized initial values) hold. Let {x{k)} be 



generated by the update rule ( [7. If k > nBK , then all components of x{k) are 
equal. 

Proof. Consider the nodes whose initial value is U. There are at most n of them. As 
long as not all entries of x{k) are equal, then every B iterations, at least one node 
must use a value strictly less than U in an update; such a node will have its value 
decreased to U — 1/Q or less. It follows that after nB iterations, the largest node 
value will be at most U — 1/Q. Repeating this argument, we see that at most nBK 
iterations are possible before all the nodes have the same value. □ 

Although the above bound gives informative results for small K, it becomes weaker 
as Q (and, therefore, K) increases. On the other hand, as Q approaches infinity, the 
quantized system approaches the unquantized system; the availability of convergence 
time bounds for the unquantized system suggests that similar bounds should be pos- 
sible for the quantized one. Indeed, in the next section, we adopt a notion of conver- 
gence time parallel to our notion of convergence time for the unquantized algorithm; 
as a result, we obtain a bound on the convergence time which is independent of the 
total number of quantization levels. 



7.2 A quantization level independent bound 

We adopt a slightly different measure of convergence for the analysis of the quantized 
consensus algorithm. For any x G M", we define m{x) = minjXj and 

n 

V_{x) = ^^(a^i — m(x))^. 

i=l 

We will also use the simpler notation m{k) and V_{k) to denote m{x{k)) and V_{x{k)), 
respectively, where it is more convenient to do so. The function V_ will be our Lya- 
punov function for the analysis of the quantized consensus algorithm. The reason for 
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not using our earlier Lyapunov function, V, is that for the quantized algorithm, V is 
not guaranteed to be monotonically nonincreasing in time. On the other hand, we 
have that V{x) < V_{x) < 4:nV{x) for an}0 x G M". As a consequence, any conver- 
gence time bounds expressed in terms of V_ translate to essentially the same bounds 
expressed in terms of V, up to a logarithmic factor. 

Before proceeding, we record an elementary fact which will allow us to relate the 
variance decrease V{x) — V{y) to the decrease, V_{x) — V_{y), of our new Lyapunov 
function. The proof involves simple algebra, and is therefore omitted. 

Lemma 7.1. Let mi, . . . , m„ and wi, . . . ,Wn be real numbers satisfying 

n n 
i=l i=l 

Then, the expression 

n n 

f{z) = - zY - - zy 



i=l i=l 



is a constant, independent of the scalar z. 

Our next lemma places a bound on the decrease of the Lyapunov function V_{t) 
between times kB and {k + 1)B — 1. 



Lemma 7.2. Let Assumptions \3. li 3.4 , ^.i , and 7. 1\ hold. Let {x{k)} be generated by 



the update rule (7A). Suppose that the components Xi{kB) of the vector x{kB) have 



been ordered from largest to smallest, with ties broken arbitrarily. Then, we have 

n-l 

YikB)-V{{k + l)B) > I J2^x,{kB)~x,+iikB)f. 

i=l 

Proof. For all k, we view Eq. (17. ip as the composition of two operators: 

yik) = Aik)xik), 

where A{k) is a doubly stochastic matrix, and 

x{k + l)= [y{k)\, 

where the quantization [-J is carried out componentwise. 

We apply Lemma mi with the identification Ui = Xi{k), Wi = yi{k). Since multipli- 
cation by a doubly stochastic matrix preserves the mean, the condition Ui = ^ ■ Wi 



^Thc first inequality follows because ~ •^)^ minimized when z is the mean of the vector 

x; to establish the second inequality, observe that it suffices to consider the case when the mean of 
x is and V{x) = 1. In that case, the largest distance between m and any Xi is 2 by the triangle 
inequality, so V_{x) < An. 
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is satisfied. By considering two different choices for the scalar z, namely, zi = x{k) = 
y{k) and Z2 = m{k), we obtain 

n 

V{xik)) - V{y{k)) = V{<k)) - Y,iy^ik) - m{k))\ (7.2) 

i=l 

Note that Xi{k + 1) — m{k) < yi{k) — m{k). Therefore, 

n n 

V{x{k)) - Y^iViif^) - ^if^)y < V-i^ik)) - J2^Xi{k + 1) - m{k)f. (7.3) 
1=1 i=i 

Furthermore, note that since Xi{k + 1) > m{k + 1) > m{k) for all i, we have that 
Xi{k + 1) — m{k + 1) < Xi{k + 1) — m{k). Therefore, 

n 

Vix{k)) - ^{xi{k + 1) - m{k)f < Vix{k)) - V_{x{k + 1)). (7.4) 

i=l 

By combining Eqs. (17.21) . (17. 3p . and (17.41) . we obtain 

V{x{t)) - V{y{t)) < V{x{t)) - V{x{t + 1)) for all t. 
Summing the preceding relations over t = kB, . . . ,{k + 1)B — 1, we further obtain 

(fc+l)B-l 

{V{x{t)) - V{y{t))) < n<kB)) - V{<{k + l)B)). 

t=kB 

To complete the proof, we provide a lower bound on the expression 

{k+l)B-l 

J2 {Vixit))-Viyit))). 

t=kB 

Since y{t) = A{t)x{t) for all t, it follows from Lemma [4. II that for any t, 
V{xit)) - Viyit)) = 5];^,,(t)(x,(t) - Xjit)Y, 

i<j 

where Wij(t) is the (i, j)-th entry of A{t)'^A{t). Using this relation and following the 
same line of analysis used in the proof of Lemma [4.41 [where the relation Xi{t) > y^^ 
holds in view of the assumption that Xi{kB) is a multiple of 1/Q for all k > 0, cf. 
Assumption 17.1] , we obtain the desired result. □ 

The next theorem contains our main result on the convergence time of the quan- 
tized algorithm. 
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Theorem 7.1. Let Assumptions \3.i\ (non-vanishing weights), \3.4 (double stochas- 
ticity), \4.1\ (connectivity relaxation), and \ 7. 1\ hold. Let {x{k)} be generated by the 
update rule ( [7. ip . Then, there exists an absolute constant c such that we have 

V{k) < eV{0) for all k > c{n^/7])B\og{l/e). 

Proof. Let us assume that V_{kB) > 0. From Lemma 17^ we have 



V{kB) - V{{k + 1)B) > ^ Y.^Xi{kB) - x,+i{kB)) 



n-l 



2 



i=l 



where the components Xi{kB) are ordered from largest to smaUest. Since V_{kB) 
YJi=ii.^iikB) - Xn{kB)Y, we have 



nkB)-mk + l)B) ^ r^Y.lZl{x,{kB)-x,+^{kB)y 



V{kB) -2 Ztii^iikB) - XnikB)y ' 

Let yi = Xi{kB) — Xn{kB). Clearly, yi > for all i, and = 0. Moreover, the 
monotonicity of Xi{kB) implies the monotonicity of yf. 

2/1 > 2/2 > ■ ■ • > yn = 0. 

Thus, 



V{kB) - 2 YL^y^ 



Next, we simply repeat the steps of Lemma 14. 5[ We can assume without loss of 
generality that XliLi vf ~ Define Zi = y-i — yi+i for i = 1, . . . , n — 1 and Zn = 0. 
We have that Zi are all nonnegative and Zi = yi — yn> I/a/ti. Therefore, 



1 n 

- min y^fVi — Vi+i)^ > - min 'S^ zf . 

The minimization problem on the right-hand side has an optimal value of at least 
and the desired result follows. □ 



7.3 Extensions and modifications 

In this section, we comment briefly on some corollaries of Theorem 17. 1[ 

First, we note that the results of Section [5] immediately carry over to the quantized 
case. Indeed, in Section [5l we showed how to pick the weights aij{k) in a decentralized 
manner, based only on local information, so that Assumptions 1 and 14.11 are satisfied, 
with rj > 1/3. When using a quantized version of the load-balancing algorithm of 
Chapter [5l we once again manage to remove the factor of l/?7 from our upper bound. 
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Proposition 7.2. For the quantized version of the load- balancing algorithm of Chap- 
ter\^ and under the same assumptions as in Theorem 15.^ if k > c n"^ B log{l / e)) , 
then V_{k) < eV_{0), where c is an absolute constant. 

Second, we note that Theorem 17.11 can be used to obtain a bound on the time 
until the values of all nodes are equal. Indeed, we observe that in the presence of 
quantization, once the condition V_{k) < l/Q"^ is satisfied, all components of x{k) 
must be equal. 

Proposition 7.3. Consider the quantized algorithm ( [7. ip , and assume that Assump- 
tions \3.i\ (non-vanishing weights), \3.4\ (double stochasticity), \4.1\ (connectivity relax- 
ation), and 7.1\ (quantized initial values) hold. If k > c{n'^ / rj) B ^\og Q + log^(O)], 



then all components of x{k) are equal, where c is an absolute constant. 



7.4 Quantization error 



Despite favorable convergence properties of our quantized averaging algorithm (17. ip . 
the update rule does not preserve the average of the values at each iteration. There- 
fore, the common limit of the sequences Xj(/c), denoted hj xj, need not be equal to 
the exact average of the initial values. We next provide an upper bound on the error 
between Xf and the initial average, as a function of the number of quantization levels. 

Proposition 7.4. There is an absolute constant c such that for the common limit Xf 
of the values Xi{k) generated by the quantized algorithm ^7.1\), we have 



Xf 



1 " 

i=i 



C T) 

<- - B\og{Qn{U-L)). 



Proof. By Proposition 17. 3[ after Oy^n^ /ri)B\og{QV_{x{0)))j iterations, all nodes will 

have the same value. Since V_{x{0))) < n{U — L)^ and the average decreases by at 
most l/Q at each iteration, the result follows. □ 



Let us assume that the parameters B, r], and U — L are fixed. Proposition 17.41 
implies that as n increases, the number of bits used for each communication, which 
is proportional to logQ, needs to grow only as 0(log?7,) to make the error negligible. 
Furthermore, this is true even if the parameters B, l/t], and U — L grow polynomially 
in n. 



For a converse, it can be seen that f2(logn) bits are needed. Indeed, consider n nodes, 
with n/2 nodes initialized at 0, and n/2 nodes initialized at 1. Suppose that Q < n/2; 
we connect the nodes by forming a complete subgraph over all the nodes with value 
and exactly one node with value 1; see Figure 17^ for an example with n = 6. Then, 



70 




Figure 7-1: Initial configuration. Each node takes the average value of its neighbors. 

each node forms the average of its neighbors. This brings one of the nodes with an 
initial value of 1 down to 0, without raising the value of any other nodes. We can 
repeat this process, to bring all of the nodes with an initial value of 1 down to 0. 
Since the true average is 1/2, the final result is 1/2 away from the true average. Note 
now that Q can grow linearly with n, and still satisfy the inequality Q < n/2. Thus, 
the number of bits can grow as r2(logn), and yet, independent of n, the error remains 
1/2. 

7.5 Concluding remarks 

The high-level summary of this chapter is that clogn bits suffice for quantized aver- 
aging. The answers obtained will not be exact, but the error can be made arbitrarily 
small by picking c large, and the favorable convergence times from the earlier chapters 
retained. 

An interesting direction is whether its possible to reduce the number of bits even 
further, to a constant number per each link maintained by a node (at least one bit 
per link is necessary merely to store incoming messages). A positive answer in the 
case of fixed graphs is provided by the following chapter. On dynamic graphs, the 
number of bits required by deterministic averaging algorithms is still not understood. 
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Chapter 8 



Averaging with a constant number 
of bits per Hnk 

The previous chapter analyzed the performance of a quantized update rule. The 
"punchline" was that if the number of bits involved in the quantization is on the 
order of logn, then arbitrarily accurate averaging is possible. 

This chapter analyzes what happens if we try to push down the number of bits 
stored at each node to a constant. Our exposition in this chapter will follow the 
preprint [50], where the results described here have previously appeared. 

We will assume that at any time step, nodes can exchange binary messages with 
their neighbors in an interconnection graph which is undirected and unchanging with 
time. Naturally, a node needs to store at least one bit for each of the links just 
to store the message arriving on that link. We will allow the nodes to maintain a 
constant number of bits for each of their links. Thus in a constant-degree network, 
this translates to a constant number of bits at each node. 

Supposing that the nodes begins with numbers Xi G {0, 1, . . . , -ft'}, we will say that 
a function of the initial values is computable with constant storage if it computable 
subject to the restrictions in the pevious paragraph. The exact average is not com- 
putable with constant storage because just storing the average takes fi(logn) bits. 
Thus we will turn to the related, but weaker, question of deciding whether the O's or 
I's are in the majority. 

In fact, we will study a more general problem called "interval averaging," which 
asks to return the set among {0}, (0, 1), {1}, (1, 2), ... , [K — 1,K), {K} within which 
the average of the initial numbers lies. If interval averging is possible, then majority 
computation with binary initial conditions is possible as well. Indeed, the argument 
for this is simple: to do majority computation, every node beginning with Xi = 1 
instead sets Xi = 2 and runs interval averaging with K = 2. Depending on whether 
the average is in {0, }(0, 1), {1}, (1, 2), {2} each node knows which initial condition (if 
any) had the majority. 

The interval averaging problem has trivial solutions with randomized algorithms 
(see Chapter [2]). These algorithms "centralize" the problem by electing a leader and 
streaming information towards the leader. Decentralized randomized algorithms are 
also possible [SUHO]. In this chapter, we address the question of whether it is possible 
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to solve this problem with deterministic algorithms. 

The main result of this chapter is the following theorem. 

Theorem 8.1. Interval averaging is computable with constant storage. Moreover, 
there exists a (constant storage) algorithm for it under which every node has the 
correct answer after 0{n^K'^ \ogK) rounds of communication. 

The remainder of this chapter will be devoted to the proof of this fact. Let us 
briefly summarize the main idea behind the computation of interval averages. Imagine 
the integer input value Xi as represented by a number of Xi pebbles at node i. The 
algorithm attempts to exchange pebbles between nodes with unequal numbers so that 
the overall distribution becomes more even. Eventually, either all nodes will have the 
same number of pebbles, or some will have a certain number and others just one 
more. The nodes will try to detect which of these two possibilites have occured, and 
will estimate the interval average accordingly. 

The remainder of this chapter is structured as follows. First, we discuss the prob- 
lem of tracking, in a distributed way, the maximum of time-varying values at each 
node. This is a seemingly unrelated problem, but it will be useful in the proof of 
Theorem 18. 1[ After describing a solution to this problem, we give a formal algorithm 
which implements the pebble matching idea above. This algorithm will use the max- 
imum tracking algorithm we developed as a subroutine to match up nodes with few 
pebbles to nodes with a large number of pebbles. 

8.1 Related literature 

A number of papers explored various tradeoffs associated with quantization of consen- 
sus schemes. We will not attempt a survey of the entire literature, but only mention 
papers that are most closely relevant to the algorithms presented here. The paper [51] 
proposed randomized gossip-type quantized averaging algorithms under the assump- 
tion that each agent value is an integer. They showed that these algorithms preserve 
the average of the values at each iteration and converge to approximate consensus. 
They also provided bounds on the convergence time of these algorithms for specific 
static topologies (fully connected and linear networks). We refer the reader also to 
the later papers |103] and [IQ]. A dynamic scheme which allows us to approximately 
compute the average as the nodes communicate more and more bits with each other 
can be found in [2T]. In the recent work [32], Carli et al. proposed a distributed al- 
gorithm that uses quantized values and preserves the average at each iteration. They 
showed favorable convergence properties using simulations on some static topologies, 
and provided performance bounds for the limit points of the generated iterates. 

8.2 Computing and tracking maximal values 

We now describe an algorithm that tracks the maximum (over all nodes) of time- 
varying inputs at each node. It will be used as a subroutine later. The basic idea is 
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simple: every node keeps track of the largest value it has heard so far, and forwards 
this "intermediate result" to its neighbors. However, when an input value changes, 
the existing intermediate results need to be invalidated, and this is done by sending 
"restart" messages. A complication arises because invalidated intermediate results 
might keep circulating in the network, always one step ahead of the restart messages. 
We deal with this difficulty by "slowing down" the intermediate results, so that they 
travel at half the speed of the restart messages. In this manner, restart messages are 
guaranteed to eventually catch up with and remove invalidated intermediate results. 

We start by giving the specifications of the algorithm. Suppose that each node i 
has a time- varying input Ui(t) stored in memory at time t, belonging to a finite set 
of numbers U. We assume that, for each i, the sequence Ui(t) must eventually stop 
changing, i.e., that there exists some T' such that 

Ui{t) = Ui(T'), for all i and t > T' . 

(However, node i need not ever be aware that Ui{t) has reached its final value.) Our 
goal is to develop a distributed algorithm whose output eventually settles on the value 
maxjMj(T'). More precisely, each node i is to maintain a number Mi{t) which must 
satisfy the following condition: for every network and any allowed sequences Ui{t), 
there exists some T" with 

Mi{t) = max Uj{t), for all i and t > T" . 

j=l,...,n 

Moreover, each node i must also maintain a pointer Pi{t) to a neighbor or to 
itself. We will use the notation P^{t) = Pp^(t){t), Pf{t) = Pp2(j)(t), etc. We require 
the following additional property, for all t larger than T": for each node i there exists 
a node j and a power K such that for all k > K we have Pl'{t) = j and Mi{t) = Uj{t). 
In words, by successively following the pointers Pi{t), one can arrive at a node with 
a maximal value. 

We next describe the algorithm. We will use the term slot t to refer, loosely 
speaking, to the interval between times t and t + 1. More precisely, during slot t each 
node processes the messages that have arrived at time t and computes the state at 
time t + 1 as well as the messages it will send at time t + 1. 

The variables Mi{t) and Pi{t) are a complete description of the state of node i 
at time t. Our algorithm has only two types of messages that a node can send to 
its neighbors. Both are broadcasts, in the sense that the node sends them to every 
neighbor: 

1. "Restart!" 

2. "My estimate of the maximum is y," where y is some number in U chosen by 
the node. 
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node i receive 
message with value 
er than M{t) ^ 



yes 



02 

M{t-\-l) = largest value 

received 
-P(^+l) = a node that sent 
that value 



At time ^+1, broadcast 

"restart" 

to all neighbors 




At time broadcast 
"7237 max is M{ t) " 
to all neighbors 



At time ^+1, broadcast 

"restart" 

to all neighbors 

At time t+1, broadcast 
"my max is AI{ t) " 
to all neighbors 



Figure 8-1: Flowchart of the procedure used by node i during slot t in the maximum 
tracking algorithm. The subscript i is omitted, but u(t), M(t), and P(t) should be 
understood as Ui{t), Mi{t), and Pi{t). In those cases where an updated value of M 
or P is not indicated, it is assumed that M{t + 1) = M{t) and P{t + 1) = P{t). 
The symbol is used to indicate no action. Note that the various actions indicated 
are taken during slot t, but the messages determined by these actions are sent (and 
instantaneously received) at time t + 1. Finally, observe that every node sends an 
identical message to all its neighbors at every time t > 0. We note that the apparent 
non-determinism in instruction 02 can be removed by picking a node with, say, the 
smallest port label. 
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Initially, each node sets Mj(0) = Ui{0) and -Pi(O) = i. At time t = 0, 1, . . ., nodes 
exchange messages, which then determine their state at time t + 1, i.e., the pair 
Mi(t + 1), Pi{t + 1), as well as the messages to be sent at time t + 1. The procedure 
node i uses to do this is described in Figure 18-11 One can verify that a memory size of 
Clog \U\ + C\ogd{i) at each node i suffices, where C is an absolute constant. (This 
is because Mj and Pi can take one of \U\ and d{i) possible values, respectively.) 

Note that while we assumed that agents can only transmit binary messages to each 
other, the above algorithm requires transmissions of the values in {0, . . . , K}. This 
means that each step of the above algorithm takes logK real-time steps to implement. 

The result that follows asserts the correctness of the algorithm. The idea of the 
proof is quite simple. Nodes maintain estimates Mj(t) which track the largest among 
all the Ui{t) in the graph; these estimates are "slowly" forwarded by the nodes to their 
neighbors, with many artificial delays along the way. Should some value Uj{t) change, 
restart messages traveling without artificial delays are forwarded to any node which 
thought j had a maximal value, causing those nodes to start over. The possibility 
of cycling between restarts and forwards is avoided because restarts travel faster. 
Eventually, the variables Ui{t) stop changing, and the algorithm settles on the correct 
answer. 

Theorem 8.2 (Correctness of the maximum tracking algorithm). Suppose that the 
Ui{t) stop changing after some finite time. Then, for every network, there is a time 
after which the variables Pi{t) and Mi{t) stop changing and satisfy Mi{t) = maxj Ui{t); 
furthermore, after that time, and for every i, the node j = PJ^{t) satisfies Mi{t) = 

Ujit). 

We now turn to the proof of Therem 18.21 

In the following, we will occasionally use the following convenient shorthand: we 
will say that a statement S{t) holds eventually if there exists some T so that S{t) is 
true for alH > T. 

The analysis is made easier by introducing the time- varying directed graph G{t) = 
({1, . . . ,n},E(t)) where {i,j) G E(t) ii i ^ j, Pi(t) = j, and there is no "restart" 
message sent by j (and therefore received by i) during time t. We will abuse notation 
by writing {i,j) G G(t) to mean that the edge belongs to the set of edges E(t) 
at time n. 

Lemma 8.1. Suppose that {i,j) ^ G{t — 1) and {i,j) G G{t). Then, i executes 02 
during slot t — 1. 

Proof. Suppose that (i, j) ^ G{t — 1) and (i, j) G G{t). If Pi(t — 1) = j, the definition 
of G(t — 1) implies that j sent a restart message to i at time t — 1. Moreover, i cannot 
execute 03 during the time slot t — 1 as this would require Pj(t — 1) = i. Therefore, 
during this time slot, node i either executes 02 (and we are don(0) or it executes one 
of 01 and 04a. For both of the latter two cases, we will have Pi{t) = i, so that {i,j) 
will not be in G{t), a contradiction. Thus, it must be that Pi{t — 1) ^ j. We now 
observe that the only place in Figure 15^ that can change Pi{t — 1)^ j to Pi{t) = j is 
02. □ 

^In fact, it can be shown that this case never occurs. 



77 



Lemma 8.2. In each of the following three cases, node i has no incoming edges in 
either graph G{t) or G(t + l); 

(a) Node i executes 01, 02, or 04a during time slot t — 1; 
(h) M,{t)^Mi{t-l); 

(c) For some j, {i,j) G G(t) but {i,j) ^ G(t - 1). 

Proof, (a) If i executes 01, 02, or 04a during slot t—1, then it sends a restart message 
to each of its neighbors at time t. Then, for any neighbor j of i, the definition of G{t) 
imphes that (j, i) is not in G(t). Moreover, by Lemma [8. in order for {j,i) to be in 
G{t + 1), node j must execute 02 during slot t. But the execution of 02 during slot 
t cannot result in the addition of the edge {j,i) at time t + 1, because the message 
broadcast by i at time t to its neighbors was a restart. So, {j,i) ^ G{t + 1). 

(b) If Mi{t) 7^ Mi{t - 1), then i executes 01, 02, or 04a during slot t - 1, so the 
claim follows from part (a). 

(c) By Lemma ISTTl it must be the case that node i executes 02 during slot t — 1, 
and part (a) implies the result. □ 

Lemma 8.3. The graph G{t) is acyclic, for all t. 

Proof. The initial graph G{0) does not contain a cycle. Let t be the first time a 
cycle is present, and let (z, j) be an edge in a cycle that is added at time t, i.e., 
belongs to G{t) but not G(t — 1). Lemma TS. 2( c) implies that i has no incoming edges 
in G(t), so {i,j) cannot be an edge of the cycle — a contradiction. □ 

Note that every node has out-degree at most one, because Pi{t) is a single- valued 
variable. Thus, the acyclic graph G{t) must be a forest, specifically, a collection of 
disjoint trees, with all arcs of a tree directed so that they point towards a root of 
the tree (i.e., a node with zero out-degree). The next lemma establishes that Mj is 
constant on any path of G{t). 

Lemma 8.4. If{i,j) E G{t), then Mi{t) = Mj{t). 

Proof. Let t' be a time when (z, j) is added to the graph, or more precisely, a time 
such that (i, j) G G{t') but {i,j) ^ G{t' — 1). First, we argue that the statement we 
want to prove holds at time t' . Indeed, Lemma 18.11 implies that during slot t' — 1, 
node i executed 02, so that Mi{t') = Mj{t' - 1). Moreover, Mj{t' - 1) = Mj{t'), 
because otherwise case (b) in Lemma 18.21 would imply that j has no incoming edges 
at time t', contradicting our assumption that {i,j) G G(t'). 

Next, we argue that the property Mi(t) = Mj(t) continues to hold, starting from 
time t' and for as long as {i,j) G G(t). Indeed, as long as (i,j) G G(t), then Mj(t) 
remains unchanged, by case (b) of Lemma |8.2[ To argue that Mi{t) also remains 
unchanged, we simply observe that in Figure 18-11 every box which leads to a change 
in Mj also sets Pi either to i or to the sender of a message with value strictly larger than 
Mi] this latter message cannot come from j because as we just argued, increases in Mj 
lead to removal of the edge {i,j) from G{t). So, changes in Mj are also accompanied 
by removal of the edge {i,j) from G{t). □ 

For the purposes of the next lemma, we use the convention Ui{—1) = Uj(0). 
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Lemma 8.5. If Pi{t) = i, then Mi{t) = Ui{t - 1); if Piit) ^ i, then Mi{t) > Mi(t-l). 



Proof. We prove this result by induction. Because of the convention 1) = Mj(0), 
and the initiahzation Pi{0) = i, Mj(0) = Mj(0), the resuh trivially holds at time 
t = 0. Suppose now that the result holds at time t. During time slot t, we have three 
possibilities for node i: 

(i) Node i executes 01 or 04a. In this case, Mi{t + 1) = Ui{t), Pi[t + 1) = i, so the 
result holds at time t + 1. 

(ii) Node i executes 02. In this case Pj(t) 7^ i and Mi{t + 1) > Mi{t) > Ui{t - 1) = 
Ui{t). The first inequality follows from the condition for entering step 02. The 
second follows from the induction hypothesis. The last equality follows because 
if Uiit) 7^ Uiit — 1), node i would have executed 01 rather than 02. So, once 
again, the result holds at time t + 1. 

(iii) Node i executes 03 or 04b. The result holds at time t + 1 because neither Ui 
nor Mi changes. 

□ 

In the sequel, we will use T' to refer to a time after which all the Ui are constant. 
The following lemma shows that, after T', the largest estimate does not increase. 

Lemma 8.6. Suppose that at some time t' > T' we have M > maxj Mj(t'). Then 

M > maxMi{t) (8.1) 

i 

for allt > t'. 

Proof. We prove Eq. (18.11) by induction. By assumption it holds at time t = t'. 
Suppose now Eq. (18. ip holds at time t; we will show that it holds at time t + 1. 

Consider a node i. If it executes 02 during the slot t, it sets Mj(t + 1) to the 
value contained in a message sent at time t by some node j. It follows from the rules 
of our algorithm that the value in this message is Mj{t) and therefore, Mi{t + 1) = 
Mj{t) < M. 

Any operation other than 02 that modifies Mi sets Mj(t + 1) = Ui{t), and since 
Ui{t) does not change after time T', we have Mi{t + 1) = Ui{t — 1). By Lemma [8.5[ 
Mi{t) >Ui{t-l), so that Mi{t + 1) < Mi{t). We conclude that Mi{t + 1) <M holds 
for this case as well. □ 

We now introduce some terminology used to specify whether the estimate Mj(t) 
held by a node has been invalidated or not. Formally, we say that node i has a valid 
estimate at time t if by following the path in G(t) that starts at i, we eventually 
arrive at a node r with Pr{t) = r and Mi{t) = Ur(t — 1). In any other case, we say 
that a node has an invalid estimate at time t. 

Remark: Because of the acyclicity property, a path in G(t), starting from a 
node i, eventually leads to a node r with out-degree 0; it follows from Lemma 18.41 



79 



that Mr{t) = Mi{t). Moreover, Lemma [83] implies that if Pr{t) = r, then Mi{t) = 
Mr{t) = Ur{t — 1), SO that the estimate is vahd. Therefore, if i has an invahd estimate, 
the corresponding node r must have Pr{t) 7^ r; on the other hand, since r has out- 
degree in G{t), the definition of G{t) imphes that there is a "restart" message from 
Pr{t) to r sent at time t. 

The following lemma gives some conditions which allow us to conclude that a 
given node has reached a final state. 

Lemma 8.7. Fix some t' > T' and let M* be the largest estimate at time t' , i.e., 
M* = maXjMj(t'). If Mi{t') = M* , and this estimate is valid, then for all t > t' : 

(a) Mi{t) = M* , Pi{t) = Pi(t'), and node i has a valid estimate at time t. 

(b) Node i executes either 03 or 04b at time t. 

Proof. We will prove this result by induction on t. Fix some node i. By assumption, 
part (a) holds at time t = t'. To show part (b) at time t = t', we first argue that 
i does not execute 02 during the time slot t. Indeed, this would require i to have 
received a message with an estimate strictly larger than M*, sent by some node j who 
executed 03 or 04b during the slot t - 1. In either case, M* < Mj{t - 1) = Mj{t), 
contradicting the definition of M*. Because of the definition of T', Ui{t) = Ui{t — 1) 
for t > T', so that i does not execute 01. This concludes the proof of the base case. 

Next, we suppose that our result holds at time t, and we will argue that it holds 
at time t + 1. If Pi{t) = i, then i executes 03 during slot t, so that Mj(t + 1) = Mi{t) 
and Pi{t + 1) = Pi(t), completes the induction step for this case. 

It remains to consider the case where Pi{t) = j ^ i. It follows from the definition 
of a valid estimate that {i,j) G E{t). Using the definition of E{t), we conclude that 
there is no restart message sent from j to i at time t. By the induction hypothesis, 
during the slot t — 1, j has thus executed 03 or 04b, so that Mj{t — 1) = Mj{t)] 
in fact. Lemma 18.41 gives that Mj{t) = Mi(t) = M*. Thus during slot t, i reads 
a message from j = Pi{t) with the estimate M*, and executes 04b, consequently 
leaving its Mj or Pi unchanged. 

We finally argue that node i's estimate remains valid. This is the case because 
since we can apply the arguments of the previous paragraph to every node j on the 
path from i to a node with out-degree 0; we obtain that all of these nodes both (i) 
keep Pj{t + 1) = Pj{t) and (ii) execute 03 or 04b, and consequently do not send out 
any restart messages. □ 

Recall (see the comments following Lemma 18.31) that G{t) consists of a collec- 
tion of disjoint in-trees (trees in which all edges are oriented towards a root node). 
Furthermore, by Lemma 18.41 the value of Mj(t) is constant on each of these trees. 
Finally, all nodes on a particular tree have either a valid or invalid estimate (the 
estimate being valid if and only if Pr{t) = r and Mr{t) = Ur{t — 1) at the root node r 
of the tree.) For any z G W, we let Gz{t) be the subgraph of G{t) consisting of those 
trees at which all nodes have Mi{t) = z and for which the estimate z on that tree 
is invalid. We refer to Gz{t) as the invalidity graph of z at time t. In the sequel we 
will say that i is in Gz{t), and abuse notation by writing i e Gz{t), to mean that i 
belongs to the set of nodes of Gzif). The lemmas that follow aim at showing that the 
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invalidity graph of the largest estimate eventually becomes empty. Loosely speaking, 
the first lemma asserts that after the Ui{t) have stopped changing, it takes essentially 
two time steps for a maximal estimate M* to propagate to a neighboring node. 

Lemma 8.8. Fix some time t > T' . Let M* he the largest estimate at that time, 
i.e., M* = maxj Mj(t). Suppose that i is in GM*{t + 2) hut not in G^j*(t). Then 



Proof. The fact that i ^ Gu-it) implies that either (i) Miit) ^ M* or (ii) Mi{t) = M* 
and i has a valid estimate at time t. In the latter case, it follows from Lemma ISTTl that 
i also has a valid estimate at time t + 2, contradicting the assumption i G (j'A^*(i + 2). 
Therefore, we can and will assume that Mi{t) < M*. Since t > T', no node ever 
executes 01. The difference between Mi{t) and Mj(t + 2) = M* can only result from 
the execution of 02 or 04a by i during time slot t or t + 1. 

Node i cannot have executed 04a during slot t + 1, because this would result 
in Pi{t + 2) = i , and i would have a valid estimate at time t + 2, contradicting the 
assumption i G G\j{t+2). Similarly, if i executes 04a during slot t it sets Pj(t+1) = i. 
Unless it executes 02 during slot t + 1, we have again Pj(t + 2) = z contradicting the 
assumption i G G\j{t + 2). Therefore, i must have executed 02 during either slot 
t + 1 or slot t, and in the latter case it must not have executed 04a during slot t + 1. 

Let us suppose that i executes 02 during slot t + 1, and sets thus Pi(t + 2) = j 
for some j that sent at time t + 1 a message with the estimate M* = Mi{t + 2). 
The rules of the algorithm imply that M* = Mj{t + 1). We can also conclude that 
Mj{t + 1) = Mj{t), since if this were not true, node j would have sent out a restart 
at time t + 1. Thus Mj{t) = M* . It remains to prove that the estimate M* of j 
at time t is not valid. Suppose, to obtain a contradiction, that it is valid. Then it 
follows from Lemma [8.71 that j also has a valid estimate at time t + 2, and from the 
definition of validity that the estimate of i is also valid at t + 2, in contradiction with 
the assumption i G GM*(t + 2). Thus we have established that Pi{t + 2) = j G GM*(t) 
if i executes 02 during slot t + 1. The same argument applies if i executes 02 during 
slot t, without executing 04a or 02 during the slot t + 1, using the fact that in this 



Loosely speaking, the next lemma asserts that the removal of an invalid maximal 
estimate M*, through the propagation of restarts, takes place at unit speed. 

Lemma 8.9. Fix some time t > T' , and let M* he the largest estimate at that 
time. Suppose that i is a root (i.e., has zero out-degree) in the forest GM*{t + 2). 
Then, either (i) i is the root of an one-element tree in GM*(t + 2) consisting only of 
i, or (ii) i is at least "two levels down in GM*(t), i.e., there exist nodes i',i" with 



Pi{t + 2) eGM'it). 



case Pi{t + 2) = Pi{t + 1). 



□ 



Proof. Consider such a node i and assume that (i) does not hold. Then, 



Mi{t) 



M,(t + 1) = Mi{t + 2) = M* 
Pi{t + 1) = P,{t + 2) 



(8.2) 
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This is because otherwise, cases (a) and (b) of Lemma 18.21 imply that i has zero in- 
degree in GM*(t + 2) C G(t + 2), in addition to having a zero out-degree, contradicting 
our assumption that (i) does not hold. Moreover, the estimate of i is not valid at 
t, because it would then also be valid at t + 2 by Lemma 18. 7[ in contradiction with 
i G GM'it + 2). Therefore, i belongs to the forest GM*{t). Let r be the root of the 
connected component to which i belongs. We will prove that i ^ r and -Pj(t) 7^ r, 
and thus that (ii) holds. 

Since r G GM*{t), we have Mr{t) = M* and thus r does thus not execute 02 
during slot t. Moreover, r is a root and has an invalid estimate, so Pr{t) 7^ r and 
there is a "restart" message from Pr{t) to r at time t. Therefore, r executes 04a 
during slot t, setting Pr{t + 1) = r and sending "restart" messages to all its neighbors 
at time t + 1. This implies that i 7^ r, as we have seen that Pi{t) = Pi{t + 1). Let 
us now assume, to obtain a contradiction, that Pi(t) = r and thus by Eq. f l8.2p . 
Pi{t + 2) = Pi{t + 1) = r. In that case, we have just seen that there is at time t + 1 a 
"restart" message from r = Pi{t + 1) ^ i to i, so i executes 04a during slot t + 1 and 
sets Pi{t + 2) = i. This however contradicts the fact Pi{t + 2) = Pj(t + 1). Therefore, 
r ^ i and r 7^ Pi{t), i.e., i is "at least two levels down" in G^/. (t). □ 

Let the depth of a tree be the largest distance between a leaf of the tree and the 
root; the depth of a forest is the largest depth of any tree in the forest. We will use 
g{-) to denote depth. The following lemma uses the previous two lemmas to assert 
that a forest carrying an invalid maximal estimate has its depth decrease by at least 
one over a time interval of length two. 

Lemma 8.10. Fix some time t > T' , and let M* be the largest estimate value at that 
time. Ifg{GM*{t + 2))>Q, then g{GM'{t + 2)) < g[GM*{t)) - I. 

Proof. Suppose that g{GM*it + 2)) > 0. Let us fix a leaf i and a root j in the forest 
Gm* [t + 2) such that the length of the path from i to j is equal to the depth of 
GM*{t + 2). Let i' be the single neighbor of node i in GM*{t + 2). We first claim 
that every edge {k, k') on the path from i' to j in Gm* {t + 2) was also present in 
GM*{t). Indeed, by Lemma [8^ the appearance of a new edge (/c, k') at time t + 1 
or t + 2 implies that node k has in-degree in G{t + 2), which contradicts k being 
an intermediate node on the path from i to j in GM*{t + 2). The same argument 
establishes that Mk{t) = Mk{t + 1) = M* . Finally, the estimate of k at time t is 
invalid, for if it were valid, it would still be valid at time t + 2 by Lemma 18. 7^ so i 
would also have a valid estimate at time t + 2, which is false by assumption. Thus we 
have just established that both the node k and its edge (fc, k') at time t + 2 belong 

to GM*{t)- 

Thus the graph GM'if) includes a path from i' to j of length g{G M*{t + 2)) — 1. 
Moreover, by Lemma [8. 9^ we know that at time t some edges (j, j') and were 
present in Gwif), so the path length from i' to j" is at least g{GM*{t + 2)) + L This 
proves that g{GM*{t)) > giGwit + 2)) + 1 and the lemma. □ 

The following lemma analyzes the remaining case of invalidity graphs with zero 
depth. It shows that the invalidity graph will be empty two steps after its depth 
reaches zero. 
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Lemma 8.11. Fix some time t > T' , and let M* he the largest estimate value at that 
time. IfGM*{t + 2) is not empty, then g{GM*{t + 1)) > or g{GM*(t)) > 0. 

Proof. Let us take a node i G GM*{t + 2) and let j = Pj(t + 2). It follows from the 
definition of a valid estimate that j ^ i. This implies that i did not execute 04a (or 
01) during slot t + 1. We treat two cases separately: 

(i) Node i did not execute 02 during slot t+1. In this case, Pi{t+1) = Pj(t+2) = j 
and Mj{t + 1) = Mj{t + 2) = M* . Besides, there is no "restart" message from 
j = Pj(t + 1) to i at time t + 1, for otherwise i would have executed Ola during slot 
t + 1, which we know it did not. Therefore, {i,j) € E(t+1) by definition of G(t), and 
Mj{t + 1) = Mj(t + 1) = M* by Lemma [8.4[ Moreover, neither i nor j have a valid 
estimate, for otherwise Lemma 18.71 would imply that they both hold the same valid 
estimate at t + 2, in contradiction with i G GM*(t + 2). So the edge {i,j) is present 
in GM*{t + 1), which has thus a positive minimal depth. 

(ii) Node i did execute 02 during slot t + In that case, there was a message with 
the value Mj(t + 2) = M* from j to i at time t + 1, which implies that Mj{t + 1) = 
Mj(t + 2) = M* . This implies that j did not execute operation 02 during slot t. 
Moreover, node j did not have a valid estimate at time t + 1. Otherwise, part (a) 
of Lemma 18.71 implies that j has a valid estimate at time t + 2, and part (b) of 
the same lemma implies there was not a "restart" message from j at t + 2, so that 
{i,j) G E{t + 2). This would in turn imply that i has a valid estimate at time t + 2, 
contradicting i G Gjv/'lt + 2). To summarize, j has an invalid estimate M* at time 
t + 1 and did not execute 02 during slot t. We now simply observe that the argument 
of case (i) applies to j at time t + 1. □ 

The next lemma asserts that the largest invalid estimates are eventually purged, 
and thus that eventually, all remaining largest estimates are valid. 

Lemma 8.12. Fix some time t > T' , and let M* be the largest estimate value at that 
time. Eventually GM*{t) is empty. 

Proof. Lemma [8.101 implies there is a time t' > T' after which g{GM*it")) = for all 
t" > t'. Lemma [8.111 then implies that GM*{t) is empty for all t > t' + 2. □ 

We are now ready for the proof of the main theorem. 

Proof of Theorem \8.2[ Let M = maxiUj(T'). It follows from the definition of a valid 
estimate that any node holding an estimate Mj(t) > M at time t > T' has an invalid 
estimate. Applying Lemma [8.121 repeatedly shows the existence of some time T> T' 
such that when t > T, no node has an estimate larger than M, and every node having 
an estimate M has a valid estimate. 

We will assume that the time t in every statement we make below satisfies t > T. 
Define Z(t) as the set of nodes having the estimate M at time t. Every node in 
Z(t) holds a valid estimate, and Z{t) is never empty because Lemma [875] implies that 
Mi(t) > M for every i with Ui = M. Moreover, it follows from Lemma [8.71 and the 
definition of validity that any node belonging to some Z{t) will forever afterwards 
maintain Mi{t) = M and will satisfy the conclusion of Theorem 18.21 
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We conclude the proof by arguing that eventually every node is in Z{t). In par- 
ticular, we will argue that a node i adjacent to a node j G Z{t) necessarily belongs to 
Z{t + 2). Indeed, it follows from Lemma [8.71 that node j sends to i a message with 
the estimate M at time t + 1. Hi e Z{t + 1), then i e Z{t + 2); else Mi{t + 1) < M, 
i executes 02 during slot t + 1, and sets Mi{t + 2) = M, so indeed i e Z{t + 2). □ 

8.3 Interval- averaging 

In this section, we present an interval-averaging algorithm and prove its correctness. 
We start by repeating the informal discussion of its main idea from the beginning 
of the chapter. Imagine the integer input value Xj as represented by a number of Xi 
pebbles at node i. The algorithm attempts to exchange pebbles between nodes with 
unequal numbers so that the overall distribution becomes more even. Eventually, 
either all nodes will have the same number of pebbles, or some will have a certain 
number and others just one more. We let Ui{t) be the current number of pebbles at 
node i] in particular, Ui{0) = Xj. An important property of the algorithm will be that 
the total number of pebbles is conserved. 

To match nodes with unequal number of pebbles, we use the maximum tracking 
algorithm of Section 18.21 Recall that the algorithm provides nodes with pointers 
which attempt to track the location of the maximal values. When a node with Ui 
pebbles comes to beheve in this way that a node with at least Ui + 2 pebbles exists, 
it sends a request in the direction of the latter node to obtain one or more pebbles. 
This request follows a path to a node with a maximal number of pebbles until the 
request either gets denied, or gets accepted by a node with at least Ui + 2 pebbles. 

8.3.1 The algorithm 

The algorithm uses two types of messages. Each type of message can be either 
originated at a node or forwarded by a node. 

(a) (Request, r): This is a request for a transfer of pebbles. Here, r is an integer 
that represents the number of pebbles Ui{t) at the node i that first originated the 
request, at the time t that the request was originated. (Note, however, that this 
request is actually sent at time t + 1.) 

(b) (Accept, w): This corresponds to acceptance of a request, and a transfer of 
w pebbles towards the node that originated the request. An acceptance with a value 
w = represents a request denial. 

As part of the algorithm, the nodes run the maximum tracking algorithm of Sec- 
tion 18. 2[ as well as a minimum tracking counterpart. In particular, each node i has 
access to the variables Mi{t) and Pi{t) of the maximum tracking algorithm (recall 
that these are, respectively, the estimated maximum and a pointer to a neighbor or 
to itself). Furthermore, each node maintains three additional variables. 

(a) "Mode(t)"G {Free, Blocked}. Initially, the mode of every node is free. A 
node is blocked if it has originated or forwarded a request, and is still waiting to hear 
whether the request is accepted (or denied). 
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(b) "Rinj(t)" and "Routj(t)" are pointers to a neighbor of i, or to i itself. The 
meaning of these pointers when in blocked mode are as follows. If Routj(t) = j, 
then node i has sent (either originated or forwarded) a request to node j, and is 
still in blocked mode, waiting to hear whether the request is accepted or denied. If 
Rinj(t) = k, and k ^ i, then node i has received a request from node k but has not 
yet responded to node k. If Rinj(t) = i, then node i has originated a request and is 
still in blocked mode, waiting to hear whether the request is accepted or denied. 

A precise description of the algorithm is given in Figure [821 The proof of correct- 
ness is given in the next subsection, thus also establishing Theorem 18. 1[ Furthermore, 
we will show that the time until the algorithm settles on the correct output is of order 
Oin^KHogK). 

8.3.2 Proof of correctness 

We begin by arguing that the rules of the algorithm preclude one potential obstacle; 
we will show that nodes will not get stuck sending requests to themselves. 

Lemma 8.13. A node never sends (originates or forwards) a request to itself. More 
precisely, Routiit) ^ i, for all i and t. 

Proof. By inspecting the first two cases for the free mode, we observe that if node i 
originates a request during time slot t (and sends a request message at time t + 1), 
then Piit) 7^ i. Indeed, to send a message, it must be true Mj(t) > Ui{t) = Ui{t — 1). 
However, any action of the maximum tracking algorithm that sets Pi{t) = i also sets 
Miit) = Ui{t — 1) , and moreover, as long as Pi doesn't change neither does Mj. So 
the recipient Pi{t) of the request originated by i is different than i, and accordingly, 
Routi(t + 1) is set to a value different than i. We argue that the same is true for the 
case where Routj is set by the the "Forward request" box of the free mode. Indeed, 
that box is enabled only when Ui{t) = Ui{t — 1) and Ui{t) — 1 < r < Mi{t) — 1, so 
that Ui{t — 1) < Mi{t). As in the previous case, this implies that Pi{t) ^ i and that 
Routj(t + 1) is again set to a value other than i. We conclude that Routj(t) ^ i for 
all i and t. □ 

We will now analyze the evolution of the requests. A request is originated at 
some time r by some originator node £ who sets Rin^(r + 1) = i and sends the 
request to some node i = Rout^(r + 1) = Pi{t). The recipient i of the request either 
accepts/denies it, in which case Riuj remains unchanged, or forwards it while also 
setting Rinj(r + 2) to i. The process then continues similarly. The end result is that 
at any given time t, a request initiated by node i has resulted in a "request path of 
node £ at time t," which is a maximal sequence of nodes i,ii, . . . ,ik with Rin^(t) = i, 
Rinij(t) = £, and Rini^(t) = im-i for m < k. 
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Lemma 8.14. At any given time, different request paths cannot intersect (they involve 
disjoint sets of nodes). Furthermore, at any given time, a request path cannot visit 
the same node more than once. 

Proof. For any time t, we form a graph that consists of all edges that lie on some 
request path. Once a node i is added to some request path, and as long as that 
request path includes i, node i remains in blocked mode and the value of Riuj cannot 
change. This means that adding a new edge that points into i is impossible. This 
readily implies that cycles cannot be formed and also that two request paths cannot 
involve a common node. □ 

We use Piit) to denote the request path of node i at time t, and se{t) to denote 
the last node on this path. We will say that a request originated by node i terminates 
when node i receives an (Accept, w) message, with any value w. 

Lemma 8.15. Every request eventually terminates. Specifically, if node i originates 
a request at time t' (and sends a request message at time t' + 1), then there exists 
a later time t" < t' + n at which node Sr{t") receives an "accept request" message 
(perhaps with w = 0), which is forwarded until it reaches i, no later than time t" + n. 

Proof. By the rules of our algorithm, node i sends a request message to node Pi{t') 
at time t' + 1. If node Pi{t') replies at time t' + 2 with a "deny request" response to ts 
request, then the claim is true; otherwise, observe that pe{t' + 2) is nonempty and until 
Siit) receives an "accept request" message, the length of pe(t) increases at each time 
step. Since this length cannot be larger than n—1, by Lemma [8.141 it follows that 
Siit) receives an "accept request" message at most n steps after i initiated the request. 
One can then easily show that this acceptance message is forwarded backwards along 
the path (and the request path keeps shrinking) until the acceptance message reaches 
i, at most n steps later. □ 

The arguments so far had mostly to do with deadlock avoidance. The next lemma 
concerns the progress made by the algorithm. Recall that a central idea of the al- 
gorithm is to conserve the total number of "pebbles," but this must include both 
pebbles possessed by nodes and pebbles in transit. We capture the idea of "pebbles 
in transit" by defining a new variable. If i is the originator of some request path that 
is present at time t, and if the final node Si{t) of that path receives an (Accept, w) 
message at time t, we let Wi{t) be the value w in that message. (This convention 
includes the special case where w = 0, corresponding to a denial of the request). In 
all other cases, we set Wi(t) = 0. Intuitively, Wi(t) is the value that has already been 
given away by a node who answered a request originated by node i, and that will 
eventually be added to Ui, once the answer reaches i. 

We now define 

Ui{t) = Uiit) + Wiit). 

By the rules of our algorithm, if Wi(t)= w > 0, an amount w will eventually be added 
to Ui, once the acceptance message is forwarded back to i. The value Ui can thus be 
seen as a future value of Ui, that includes its present value and the value that has 
been sent to i but has not yet reached it. 
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► Deny all incoming requests 



Originate request: 
Send (Request, u{t)) to P{i) 
Mode{t-\-l) = Blocked 
B.oiit{t+l) = P(t) 
Rin{t+l) - / 





yes 


r := smallest of the request values 

k := a node that sent request with value r 



Free mode 



Accept request: 
w= \{u(t)-r)l2\ 
u{t+l) = u{t) - w 
Send (Accept, w) to k 



Forward request: 
Send (Request, r) to P[t) 
Mode{t+\) = Blocked 
Ii.out(t+l) = P(t) 
Rm{t+1) = k 



Deny the request of k 




Blocked mode 



FulEU the acceptance: 
u{t+\) = u{t) + w 
Mode{t+l) = Free 



Forward the acceptance: 
Forward (Accept, w) to Rin(t) 
Mode(i+l) = Free 



Figure 8-2: Flowchart of the procedure used by node i during slot t in the interval- 
averaging algorithm. The subscript i is omitted from variables such as Mode(t), 
etc. Variables for which an update is not explicitly indicated are assumed to remain 
unchanged. "Denying a request" is a shorthand for i sending a message of the form 
(Accept, 0) at time t -|- 1 to a node from which i received a request at time t. Note 
also that "forward the acceptance" in the blocked mode includes the case where the 
answer had w = Q (i.e., it was a request denial), in which case the denial is forwarded. 
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The rules of our algorithm imply that the sum of the Ui remains constant. Let x 
be the average of the initial values Xj. Then, 



n 1 " 

n ^-^ n ^-^ 

i=l 1=1 

We define the variance function V as 

n 

vit) = Y^iut) 



X. 



-X?. 



i=l 



Lemma 8.16. The number of times that a node can send an acceptance message 
(Accept, w) with w 0, is finite. 

Proof. Let us first describe the idea behind the proof. Suppose that nodes could 
instantaneously transfer value to each other. It is easily checked that if a node i 
transfers an amount w* to a node j with Ui — Uj > 2 and 1 < w* < ^{ui — Uj), 
the variance J^ii'^i ~ decreases by at least 2. Thus, there can only be a finite 
number of such transfers. In our model, the situation is more complicated because 
transfers are not immediate and involve a process of requests and acceptances. A key 
element of the argument is to realize that the algorithm can be interpreted as if it 
only involved instantaneous exchanges involving disjoint pairs of nodes. 

Let us consider the difference V{t + 1) — V(t) at some typical time t. Changes 
in V are solely due to changes in the Wj. Note that if a node i executes the "fulfill 
the acceptance" instruction at time t, node i was the originator of the request and 
the request path has length zero, so that it is also the final node on the path, and 
Si{t) = i. According to our definition, Wi{t) is the value w in the message received by 
node Si(t) = i. At the next time step, we have Wi(t + 1) = but Ui(t + 1) = Ui(t) + w. 
Thus, Ui does not change, and the function V is unaffected. 

By inspecting the algorithm, we see that a nonzero difference V{t + 1) — V{t) is 
possible only if some node i executes the "accept request" instruction at slot t, with 
some particular value w*^ 0, in which case Ui{t + 1) = Ui{t) — w*. For this to happen, 
node i received a message (Request, r) at time t from a node k for which Routfc(t) = i, 
and with Ui{t) — r > 2. That node k was the last node, si{t), on the request path of 
some originator node i. Node k receives an (Accept, w*) message at time t + 1 and, 
therefore, according to our definition, this sets we{t + 1) = w*. 

It follows from the rules of our algorithm that i had originated a request with value 
r = Ui(t') at some previous time t'. Subsequently, node i entered the blocked mode, 
preventing any modification of ui, so that r = Ui(t) = ue{t + 1). Moreover, observe 
that Wiit) was because by time t, no node had answered ts request. Furthermore, 
Wi(t + 1) = Wi(t) = because having a positive Wi requires i to be in blocked mode, 
preventing the execution of "accept request" . It follows that 

Ui(t + 1) = Ui(t + 1) = Ui{t) — W* = Uiit) — w*, 
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and 

ui>{t + l)=r + w* = uiit) + w*. 
Using the update equation w* = [{ui{t) — r)/2j, and the fact Ui{t) — r > 2, we obtain 

l<w*< ^{u,{t) -r) = ^{ui{t) - ue{t)). 

Combining with the previous equahties, we have 

iiiit) + 1 < (ii{t + 1) < (ii{t + 1) < Ui{t) - 1. 

Assume for a moment that node i was the only one that executed the "accept request" 
instruction at time t. Then, all of the variables Uj, for j ^ remain unchanged. 
Simple algebraic manipulations then show that V decreases by at least 2. If there 
was another pair of nodes, say j and k, that were involved in a transfer of value 
at time t, it is not hard to see that the transfer of value was related to a different 
request, involving a separate request path. In particular, the pairs i, i and j, k do not 
overlap. This implies that the cumulative effect of multiple transfers on the difference 
V(t + 1) — V(t) is the sum of the effects of individual transfers. Thus, at every time 
for which at least one "accept request" step is executed, V decreases by at least 2. 
We also see that no operation can ever result in an increase of V. It follows that the 
instruction "accept request" can be executed only a finite number of times. □ 

Proposition 8.1. There is a time t' such that Ui{t) = Ui{t'), for all i and all t > t' . 

Moreover, 

^Ui{t') = ^Xi, 

i i 

maxuj(t') — minuj(t') < 1. 

i i 

Proof. It follows from Lemma [8. 161 that there is a time t'after which no more requests 
are accepted with w ^ 0. By Lemma [8. 15^ this implies that after at most n additional 
time steps, the system will never again contain any "accept request" messages with 
w 7^ 0, so no node will change its value Ui(t) thereafter. 

We have already argued that the sum (and therefore the average) of the variables 
itiit) does not change. Once there are no more "accept request" messages in the 
system with w 7^ 0, we must have Wi(t) = 0, for all i. Thus, at this stage the average 
of the Ui(t) is the same as the average of the Xj. 

It remains to show that once the Ui{t) stop changing, the maximum and minimum 
Ui{t) differ by at most 1. Recall (cf. Theorem that 18.21) that at some time after the 
Ui{t) stop changing, all estimates Mi{t) of the maximum will be equal to M{t), the 
true maximum of the Mj(t); moreover, starting at any node and following the pointers 
Pi(t) leads to a node j whose value Uj{t) is the true maximum, M{t). Now let A 
be the set of nodes whose value at this stage is at most maxj'Uj(t) — 2. To derive a 
contradiction, let us suppose that A is nonempty. 

Because only nodes in A will originate requests, and because every request eventu- 
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ally terminates (cf. Lemma [8]T5]) , if we wait some finite amount of time, we will have 
the additional property that all requests in the system originated from A. Moreover, 
nodes in A originate requests every time they are in the free mode, which is infinitely 
often. 

Consider now a request originating at a node in the set A. The value r of such a 
request satisfies M{t) — r > 2, which implies that every node that receives it either 
accepts it (contradicting the fact that no more requests are accepted after time t'), or 
forwards it, or denies it. But a node i will deny a request only if it is in blocked mode, 
that is, if it has already forwarded some other request to node Pi{t). This shows that 
requests will keep propagating along links of the form (z,Pj(t)), and therefore will 
eventually reach a node at which Ui{t) = M{t) > r + 2, at which point they will be 
accepted — a contradiction. □ 

We are now ready to conclude. 

Proof of Theorem \8.1\ Let u* be the value that Ui{t) eventually settles on. Proposi- 
tion lS.ll readily implies that if the average x of the Xi is an integer, then Ui{t) = u* = x 
will eventually hold for every i. If x is not an integer, then some nodes will eventually 
have Uiit) = u* = [x\ and some other nodes Ui(t) = u* = \x]. Besides, using the 
maximum and minimum computation algorithm, nodes will eventually have a correct 
estimate of maxw* and minw*, since all Ui(t) settle on the fixed values u*. This al- 
lows the nodes to determine whether the average is exactly u* (integer average), or 
whether it lies in {u*,u* + 1) or {u* — l,u*) (fractional average). Thus, with some 
simple post-processing at each node (which can be done using finite automata), the 
nodes can produce the correct output for the interval-averaging problem. The proof 
of Theorem 18.11 is complete. □ 

Next, we give a convergence time bound for the algorithms we have just described. 

The general idea is quite simple. We have just argued that the nonnegative func- 
tion V{t) decreases by at least 2 each time a request is accepted. It also satisfies 
V{0) = 0{nK'^). Thus there are at most 0{nK'^) acceptances. To complete the 
argument, one needs to argue that if the algorithm has not terminated, there will 
be an acceptance within 0{n) iterations (which corresponds to 0(?7,logi^) real-time 
rounds of communication due to the 0{logK) slowdown from transmitting elements 
in {0, . . . , K}). This should be fairly clear from the proof of Theorem 18. 1[ A formal 
argument is given in the next section. It is also shown there that the running time of 
our algorithm, for many graphs, satisfies a f2(n^) lower bound, in the worst case over 
all initial conditions. 

8.4 The Time to Termination 

The aim of this section is to prove an 0{n^K^ log K) upper bound on the time to 
termination of the interval- averaging algorithm from Section 18.31 
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We will use the notation M'(t) to denote the largest estimate held by any node 
at time t or in the n time steps preceding it: 



M'{t) = max Mi{k). 

i = 1, . . . ,n 

k = t,t — 1, . . . ,t — n 

For M'{t) to be well defined, we will adopt the convention that for all negative times 
k, M,{k) = Ui{0). 

Lemma 8.17. In the course of the execution of the interval-averaging algorithm, 
M'{t) never increases. 

Proof. Fix a time t. We will argue that 

Mi{t + 1) < M'{t) (8.3) 

for each i. This clearly implies M'{t + 1) < M'{t). 

If Mj(t + 1) < Mj(t), then Eq. fl8.3l) is obvious. We can thus suppose that 
Mj (t + 1 ) > Mi [t) . There are only three boxes in Figure 18-11 which result in a change 
between Mi{t) and Mj(t + 1). These are 02, 01, and 04a. We can rule out the 
possibility that node i executes 02, since that merely sets Mj(t + 1) to some Mj(t), 
and thus cannot result in Mi{t + 1) > M'{t). 

Consider, then, the possibility that node i executes 01 or 04a, and as a conse- 
quence Mi{t -|- 1) = Ui{t). If Ui(t) < Ui(t — 1), then we are finished because 

Mi{t + 1) = Ui{t) < Ui{t - 1) < Mi{t), 

which contradicts the assumption Mi{t + 1) > Mi{t). Note that the last step of the 
above chain of inequalities used Lemma 18.51 

Thus we can assume that Ui{t) > Ui{t — 1). In this case, i must have fulfilled 
acceptance from some node j during slot t — 1. Let t the time when node j received 
the corresponding request message from node i. The rules of our algorithm imply that 
Ui(t) = Ui(t — 1) , and that the quantity w sent by j to i in response to the request is 
no greater than ^{uj{i) — Ui{i)). This implies that Ui{t) = Ui{t — 1) + w < Uj{t). 

Crucially, we have that t G [t — 1 — n,t — V\, since at most n + 1 time steps pass 
between the time node j receives the request message it will accept and the time 
when node i fulfills j's acceptance. So 

Mi{t + 1) = Ui{t) < Uj{i) < Mj{i+l) < M'{t). 

We have thus showed that Mj(t + 1) < M'{t) in every possible case, which implies 
that M'(t + 1) < M'(t). □ 

Lemma 8.18. Consider the maximum tracking algorithm. If each Ui{t) is constant 
for t G[to,^o + 4n], then at least one of the following two statements is true: (a) 
M'{to + 3n)<M'{to). (h) Mi{to + 4n) = max^ Uj{to) for every i. 
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Proof. Suppose first that no node holds an estimate equal to M'ito) at some time 
between to + 2n and to + 3n. Then it follows from the definition of M'{t) and its 
monotonicity fl8.17p that condition (a) holds. Suppose now that some node holds an 
estimate equal to M'(to) at some time between to + 2n and to + 3n. The definition of 
M'[t) and the monotonicity of maxj Mi{t) when all Ui are constant (Lemma I8.6P imply 
that M'(to) = maxj Mi{t) for all t G [to, to + 3n]. It follows from repeated application 
of Lemmas 18.101 and 18.111 (similarly to what is done in the proof of Proposition I8.12p 
that every estimate M'(to) at time to + 2n is valid, which by definition implies the 
existence of at least one node i with Mj(to) = M'(to). Besides, since Mj(t) > Ui(t) 
holds for all i and t by Lemma [875] and since we know that M'(to) = maxj Mj(t) for all 
t G [to, to + 3n], we have M'(to) = maxjMj(to). As described in the proof of Theorem 
18.21 this implies that after at most 2n more time steps, Mj(t) = M'(to) holds for every 
i, and so (b) holds. □ 

The next lemma upper bounds the largest time before some request is accepted 
or some outdated estimate is purged from the system. Recall that x = Xi)/n. 

Lemma 8.19. Consider the interval-averaging algorithm described in Sections \8.2\ 
and \8.3[ For any to, at least one of the following is true: (a) Some node accepts 
a request at some slot t G [to, to + 8n - 1]. (b) We have M'{t + 1) < M'{t) for 
some t G [to + l,to + "in], (c) All Ui remain forever constant after time to + n, 
with Ui G {[xj, [x]}, and all Mi(t) remain forever constant after time to + 5n, with 
Mi = [x] . 

Proof. Suppose that condition (a) does not hold, i.e., that no node accepts a request 
between to and to + 8n. Since an acceptance message needs to travel through at most 
n — 2 intermediate nodes before reaching the originator of the request (cf. Lemma 
I8.15p . we conclude that the system does not contain any acceptance messages after 
time to + n. As a result, no node modifies its value Ui{t) between times to + n and 
to + 8n. Abusing notation slightly, we will call these values Ui. 

It follows from Lemma [8. 181 that either condition (b) holds, or that there is a time 
i < tQ + 5n a.t which Mj(t) = maxj Uj for every i. Some requests may have been 
emitted in the interval [to,to + 5n]. Since we are assuming that condition (a) does not 
hold, these requests must have all been denied. It follows from Lemma [8. 151 that none 
of these requests is present by time to + 7n. Moreover, by the rules of our algorithm, 
once Mi becomes equal to max^ uj for every node i, every node with Ui < maxj uj — 2 
will keep emitting requests. Using an argument similar to the one at the end of the 
proof of Proposition 18.11 if such requests are sent, at least one must be accepted 
within n time steps, that is, no later than time to + 8n. Since by assumption this has 
not happened, no such requests could have been sent, implying that Ui > maxj uj — 1 
for every i. Moreover, this implies that no request messages/acceptance are ever sent 
after time to + 7n, so that Ui never change. It is easy to see that the Mj never change 
as well, so that condition condition (c) is satisfied. □ 

We can now give an upper bound on the time until our algorithm terminates. 
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Theorem 8.3. The interval-averaging algorithm described in Section WT^ terminates 
after at most 0(n'^KHogK) time steps. 

Proof. Consider the function V{t) = Yl^=i ~ ^)^! where u is the average of the 

Xi, which is also the average of the Ui, and where the Ui{t) are as defined before Lemma 
18.161 Since Ui{t) G {0, 1, . . . , K} for all i, one can verify that V^(0) < juK"^. Moreover, 
as explained in the proof of Lemma [8.161 V{t) is non-increasing, and decreases by at 
least 2 with every request acceptance. Therefore, a total of at most ^nK"^ requests can 
be accepted. Furthermore, we showed that M'{t) is non-increasing, and since M'{t) 
always belongs to {0, 1, ... , K}, it can strictly decrease at most K times. It follows 
then from Lemma [8]T9] that condition (c) must hold after at most ^nK"^ -Sn + K -Sn+b 
time steps. 

Recall that in parallel with the maximum-tracking and averaging algorithm, we 
also run a minimum tracking algorithm. In the previous paragraph, we demonstrated 
that condition (c) of Lemma l8.19l holds. i.e. Ui remain fixed forever, after n^K'^+K-Sn 
time steps. A similar argument to Lemma [8.181 implies that the minimum algorithm 
will reach a fixed point after an additional {3K + 4)n steps. 

Putting it all together, the algorithm reaches a fixed point after n'^K'^ -\-{6K + 4) -n 
steps. Accounting in addition for the logi^' slowdown from transmitting values in 
{0, . . . , K}, we obtain a convergence time of 0{n^K^ logi^'). □ 

We note that there are cases where the running time of interval averaging is 
quadratic in n. For example, consider the network in Figure 18. 4[ consisting of two 
arbitrary connected graphs Gi,G2 with n/3 nodes each, connected by a line graph 
of n/3 nodes. Suppose that K = 2, and that = if z G Gi, = 2 if i G G2, and 
Xi = 1 otherwise. The algorithm will have the nodes of Gi with Ui = send requests 
to nodes j in G2 with uj = 2, and each successful request will result in the pair of 
nodes changing their values, Ui and Uj, to 1. The system will reach its final state 
after n/3 such successful requests. Observe now that each successful request must 
cross the line graph, which takes n/3 time steps in each direction. Moreover, since 
nodes cannot simultaneously treat multiple requests, once a request begins crossing 
the line graph, all other requests are denied until the response to the first request 
reaches Gi, which takes at least 2n/3 time steps. Therefore, in this example, it takes 
at least 2n'^/9 time steps until the algorithm terminates. 



8.5 Simulations 

We report here on simulations involving our algorithm on several natural graphs. 
Figures 18-41 and 18-51 describe the results for a complete graph and a line. Initial 
conditions were random integers between 1 and 30, and each data point represents 
the average of two hundred runs. As expected, convergence is faster on the complete 
graph. Moreover, convergence time in both simulations appears to be approximately 
linear. 
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Figure 8-3: A class of networks and initial conditions for which our algorithm takes 
6(n^) time steps to reach its final state. 

8.6 Concluding remarks 

In this chapter, we have given a deterministic algorithm for averaging which stores a 
constant number of bits per each link. Unfortunately, this algorithm works only for 
static graphs; whether such an algorithm exists for time-varying graph sequences is 
still an open question. 

As discussed in chapter [21 low storage is one of the main reasons to pick an 
averaging algorithm over competitors such as flooding. It is therefore interesting 
to consider just how low the storage requirements of averaging algorithms are. In 
the next chapter, we will ask the analogous question for the problem of computing 
arbitrary functions. 
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Figure 8-4: The number of iterations as a function of the number of nodes for a 
complete graph. 
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Figure 8-5: The number of iterations as a function of the number of nodes for a hne 
graph. 
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Chapter 9 



A framework for distributed 
function computation 

We now wish to take a broader perspective and consider general distributed function 
computation problems modeled on consensus. Indeed, the goal of many multi-agent 
systems, distributed computation algorithms, and decentralized data fusion methods 
is to have a set of nodes compute a common value based on initial values or observa- 
tions at each node. Towards this purpose, the nodes, which we will sometimes refer 
to as agents, perform some internal computations and repeatedly communicate with 
each other. The objective of this chapter is to understand the fundamental limita- 
tions and capabilities of such systems and algorithms when the available information 
and computational resources at each node are limited. 

Our exposition in this chapter will follow the preprint [50], where the results 
described here have previously appeared. 

9.1 Motivation 

(a) Quantized consensus: Suppose that each node begins with an integer value 
Xj(0)G {0, . . . , K}. We would like the nodes to end up, at some later time, with values 
Hi that are almost equal, i.e., \yi — yj\ < 1, for all while preserving the sum of the 
values, i.e., '^^=iXi{0) = Y17=iyi- T^^^^ is the so-called quantized averaging problem, 
which has received considerable attention recently; see, e.g., [53 HOI El ESI [H], and 
we have talked about it at some length in the previous Chapters [7] and [HI It may 
be viewed as the problem of computing the function (1/n) XliLi-^j' rounded to an 
integer value. 

(b) Distributed hypothesis testing and majority voting: Consider n sensors 
interested in deciding between two hypotheses, Hq and Hi. Each sensor collects 
measurements and makes a preliminary decision Xi G {0, 1} in favor of one of the 
hypotheses. The sensors would like to make a final decision by majority vote, in which 
case they need to compute the indicator function of the event > n/2, in a 
distributed way. Alternatively, in a weighted majority vote, they may be interested in 
computing the indicator function of an event such as Yl^=i — 3?T'/4. A variation of 
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this problem involves the possibility that some sensors abstain from the vote, perhaps 
due to their inability to gather sufficiently reliable information. 

(c) Direction coordination on a ring: Consider n vehicles placed on a ring, each 
with some arbitrarily chosen direction of motion (clockwise or counterclockwise). We 
would like the n vehicles to agree on a single direction of motion. A variation of 
this problem was considered in [71], where, however, additional requirements on the 
vehicles were imposed which we do not consider here. The solution provided in 
[7T] was semi-centralized in the sense that vehicles had unique numerical identifiers, 
and the final direction of most vehicles was set to the direction of the vehicle with 
the largest identifier. We wonder whether the direction coordination problem can 
be solved in a completely decentralized way. Furthermore, we would like the final 
direction of motion to correspond to the initial direction of the majority of the vehicles: 
if, say, 90% of the vehicles are moving counterclockwise, we would like the other 10% to 
turn around. If we define Xi to be 1 when the ith vehicle is initially oriented clockwise, 
and if it is oriented counterclockwise, then, coordinating on a direction involves the 
distributed computation of the indicator function of the event ^11=1 — iT'/'^- 

(d) Solitude verification: This is the problem of checking whether exactly one 
node has a given state. This problem is of interest if we want to avoid simultaneous 
transmissions over a common channel |1H] , or if we want to maintain a single leader 
(as in motion coordination — see for example [22]) Given states Xi G {0, 1, . . . , K}, 
solitude verification is equivalent to the problem of computing the binary function 
which is equal to 1 if and only if |{i : Xj = 0}| = 1. 

There are numerous methods that have been proposed for solving problems such 
as the above. Oftentimes, different algorithms involve different computational capa- 
bilities on the part of the nodes, which makes it hard to talk about a "best" algorithm. 
At the same time, simple algorithms (such as setting up a spanning tree and aggre- 
gating information by progressive summations over the tree, as in Chapter [2]) are 
often considered undesirable because they require too much coordination or global 
information. It should be clear that a sound discussion of such issues requires the 
specification of a precise model of computation, followed by a systematic analysis of 
fundamental limitations under a given model. This is precisely the objective of this 
chapter: to propose a particular model, and to characterize the class of functions 
computable under this model. 

9.1.1 The features of our model 

Our model provides an abstraction for common requirements for distributed algo- 
rithms in the wireless sensor network literature. We model the nodes as interacting 
deterministic finite automata that exchange messages on a fixed undirected network, 
with no time delays or unreliable transmissions. Some important qualitative features 
of our model are the following. 

Identical nodes: Any two nodes with the same number of neighbors must run the 
same algorithm. 
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Anonymity: A node can distinguish its neighbors using its own, private, local iden- 
tifiers. However, nodes do not have global identifiers. 

Determinism: Randomization is not allowed. This restriction is imposed in order 
to preclude essentially centralized solutions that rely on randomly generated distinct 
identifiers and thus bypass the anonymity requirement. Clearly, developing algorithms 
is much harder, and sometimes impossible, when randomization is disallowed. 

Limited memory: We focus on the case where the nodes can be described by finite 
automata, and pay special attention to the required memory size. Ideally, the number 
of memory bits required at each node should be bounded above by a slowly growing 
function of the degree of a node. 

Absence of global information: Nodes have no global information, and do not 
even have an upper bound on the total number of nodes. Accordingly, the algorithm 
that each node is running is independent of the network size and topology. 

Convergence requirements: Nodes hold an estimated output that must converge 
to a desired value which is a function of all nodes' initial observations or values. In 
particular, for the case of discrete outputs, all nodes must eventually settle on the 
desired value. On the other hand, the nodes do not need to become aware of such 
termination, which is anyway impossible in the absence of any global information |7] . 

In this chapter, we only consider the special case of fixed graph topologies, 
where the underlying (and unknown) interconnection graph does not change with 
time. Developing a meaningful model for the time-varying case and extending our 
algorithms to that case is an interesting topic, but outside the scope of this thesis. 

9.1.2 Literature review 

There is a very large literature on distributed function computation in related models 
of computation [HI [61]. This literature can be broadly divided into two strands, 
although the separation is not sharp: works that address general computability issues 
for various models, and works that focus on the computation of specific functions, 
such as the majority function or the average. We start by discussing the first strand. 

A common model in the distributed computing literature involves the requirement 
that all processes terminate once the desired output is produced and that nodes 
become aware that termination has occurred. A consequence of the termination 
requirement is that nodes typically need to know the network size n (or an upper 
bound on n) to compute non-trivial functions. We refer the reader to [3l [TJ I102[ 
[60l [75] for some fundamental results in this setting, and to [l2] for a comprehensive 
summary of known results. Closest to our work is the reference [33] which provides 
an impossibility result very similar to our Theorem 19. 1[ for a closely related model 
computation. 

The biologically-inspired "population algorithm" model of distributed computa- 
tion has some features in common with our model, namely, anonymous, bounded- 
resource nodes, and no requirement of termination awareness; see [6] for an overview 
of available results. However, this model involves a different type of node interactions 
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from the ones we consider; in particular, nodes interact pairwise at times that may 
be chosen adversarially. 

Regarding the computation of specific functions, [62] shows the impossibihty of 
majority voting if the nodes are hmited to a binary state. Some experimental memo- 
ryless algorithms (which are not guaranteed to always converge to the correct answer) 
have been proposed in the physics literature Several papers have quantified the 
performance of simple heuristics for computing specific functions, typically in ran- 
domized settings. We refer the reader to [51], which studied simple heuristics for 
computing the majority function, and to [86], which provides a heuristic that has 
guarantees only for the case of complete graphs. 

The large literature on quantized averaging often tends to involve themes similar 
to those addressed in this chapter jlQl |59l El [28l [55]. However, the underlying models 
of computation are typically more powerful than ours, as they allow for randomization 
and unbounded memory. Closer to the current chapter, [7^1 develops an algorithm 
with 0(n2) convergence time for a variant of the quantized averaging problem, but 
requires unbounded memory. Reference [11] provides an algorithm for the particular 
quantized averaging problem that we consider in Section 19.41 (called in [11] the "in- 
terval consensus problem"), which uses randomization but only bounded memory (a 
total of two bits at each node). Its convergence time is addressed in [3S], but a precise 
convergence time bound, as a function of n, is not available. Nevertheless, it appears 
to be significantly higher than O(n^). Similarly, the algorithm in |103] runs in 0{n^) 
time for the case of fixed graphs. (However, we note that [103] also addresses the 
case of time- varying graphs.) Roughly speaking, the algorithms in [TTl I103j work by 
having positive and negative "load tokens" circulate randomly in the network until 
they meet and annihilate each other. Our algorithm involves a similar idea. However, 
at the cost of some algorithmic complexity, our algorithm is deterministic. This al- 
lows for fast progress, in contrast to the slow progress of algorithms that need to wait 
until the coalescence time of two independent random walks. Finally, a deterministic 
algorithm for computing the majority function (and some more general functions) 
was proposed in [S3]. However, the algorithm appears to rely on the computation of 
shortest path lengths, and thus requires unbounded memory at each node. 

Semi-centralized versions of the problem, in which the nodes ultimately transmit 
to a fusion center, have often been considered in the literature, e.g., for distributed 
statistical inference [H] or detection [56]. The papers [15], [57], and [65] consider the 
complexity of computing a function and communicating its value to a sink node. We 
refer the reader to the references therein for an overview of existing results in such 
semi-centralized settings. However, the underlying model is fundamentally different 
from ours, because the presence of a fusion center violates our anonymity assumption. 

Broadly speaking, our results differ from previous works in several key respects: 
(i) Our model, which involves totally decentralized computation, deterministic algo- 
rithms, and constraints on memory and computation resources at the nodes, but does 
not require the nodes to know when the computation is over, is different from that 
considered in almost all of the relevant literature, (ii) Our focus is on identifying 
computable and non-computable functions under our model, and we achieve a nearly 
tight separation, as evidenced by Theorem 19.11 and Corollary 19.41 
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9.1.3 Summary and Contributions 



We provide a general model of decentralized anonymous computation on fixed graphs, 
with the features described in Section I9.1.H and characterize the type of functions of 
the initial values that can be computed. 

We prove that if a function is computable under our model, then its value can 
only depend on the frequencies of the different possible initial values. For example, 
if the initial values Xi are binary, a computable function can only depend on po '■= 
\{i : Xi = 0}\/n and pi := \{i : Xi = l}\/n. In particular, determining the number of 
nodes, or whether at least two nodes have an initial value of 1, is impossible. 

Conversely, we prove that if a function only depends on the frequencies of the 
different possible initial values (and is measurable), then the function can be ap- 
proximated with any given precision, except possibly on a set of frequency vectors 
of arbitrarily small volume. Moreover, if the dependence on these frequencies can 
be expressed through a combination of linear inequalities with rational coefficients, 
then the function is computable exactly. In particular, the functions involved in the 
quantized consensus, distributed hypothesis testing, and direction coordination ex- 
amples are computable, whereas the function involved in solitude verification is not. 
Similarly, statistical measures such as the standard deviation of the distribution of 
the initial values can be approximated with arbitrary precision. Finally, we show that 
with infinite memory, the frequencies of the different initial values (i.e., po, pi in the 
binary case) are computable exactly, thus obtaining a precise characterization of the 
computable functions in this case. 

The key to our positive results is the algorithm for calculating the (suitably quan- 
tized) average of the initial values described in the previous Chapter [HI 

9.1.4 Outline 

In Section 19. 2[ we describe formally our model of computation. In Section 19. 3[ we 
establish necessary conditions for a function to be computable. In Section 19. 4[ we 
provide sufficient conditions for a function to be computable or approximable. Our 
positive results rely on an algorithm that keeps track of nodes with maximal values, 
and an algorithm that calculates a suitably rounded average of the nodes' initial val- 
ues; these were described in the previous Chapter [HI We we end with some concluding 
remarks, in Section [931 

9.2 Formal description of the model 

Under our model, a distributed computing system consists of three elements: 

(a) A network: A network is a triple {n,G,C), where n is the number of nodes, 
and G = (V, E) is a connected undirected graph G = (V, E) with n nodes (and no 
self-arcs). We define d{i) as the of node i. Finally, £ is a port labeling which assigns 
a port number (a distinct integer in the set {0, 1, ... , d{i)}) to each outgoing edge of 
any node i. 
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We are interested in algorithms that work for arbitrary port labelings. However, in 
the case of wireless networks, port labehngs can be assumed to have some additional 
structure. For example, if two neighboring nodes i and j have coordinated so as to 
communicate over a distinct frequency, they should be able to coordinate their port 
numbers as well, so that the port number assigned by any node i to an edge is 
the same as the port number assigned by j to edge [j, i). We will say that a network 
is edge-labeled if port numbers have this property. In Section 19.31 we will note that 
our negative results also apply to edge-labeled networks. 

(b) Input and output sets: The input set is a finite set X = {0, 1, . . . , K} to 
which the initial value of each node belongs. The output set is a finite set Y to which 
the output of each node belongs. 

(c) An algorithm: An algorithm is defined as a family of finite automata (^4^)^=1 2,..., 
where the automaton Ad describes the behavior of a node with degree d. The state 
of the automaton Ad is a tuple [x, z, y; (mi, . . . , m^)]; we will call x G A the initial 
value, z & Zd the internal memory state, y & Y the output or estimated answer, and 
mi, . . . , nid G M the outgoing messages. The sets Zd and M are assumed finite. We 
allow the cardinality of Zd to increase with d. Clearly, this would be necessary for any 
algorithm that needs to store the messages received in the previous time step. Each 
automaton Ad is identified with a transition law from X x Zd x Y x M'^ into itself, 
which maps each [x, z, y; (mi, . . . , m^)] to some [x, z' , y'; {m'l, . . . , m'^)] . In words, at 
each iteration, the automaton takes x, z, y, and incoming messages into account, to 
create a new memory state, output, and (outgoing) messages, but does not change 
the initial value. 

Given the above elements of a distributed computing system, an algorithm pro- 
ceeds as follows. For convenience, we assume that the above defined sets Y, Zd, 
and M contain a special element, denoted by 0. Each node i begins with an ini- 
tial value G A and implements the automaton Ad(i), initialized with x = Xi and 
z = y = nil = ■ ■ ■ = rud = 0. We use Si{t) = [xi,yi(t), Zi(t),mi^i{t), . . . ,mi^dii)(t)] 
to denote the state of node i's automaton at time t. Consider a particular node i. 
Let ji, ■ ■ ■ ,jd{i) be an enumeration of its neighbors, according to the port numbers. 
(Thus, jk is the node at the other end of the kth outgoing edge at node i.) Let pk be 
the port number assigned to link (jfc,i) according to the port labeling at node jk- At 
each time step, node i carries out the following update: 

[xi,Zi{t + l),yi{t + l);mi,i(i + 1), . . . ,mi^d(i){t + 1)] 



Xi,Zi{t), yi (t) ; ^p-^ (t), . . . , mj^^.^ ^p^^.^ (t) 



In words, the messages nij^^p^it), k = 1, . . . ,d{i), "sent" by the neighbors of i into 
the ports leading to i are used to transition to a new state and create new messages 
fni^kit + 1),A; = 1,..., d{i), that i "sends" to its neighbors at time t + 1. We say that 
the algorithm terminates if there exists some y* & Y (called the final output of the 
algorithm) and a time t' such that yi{t) = y* for every i and t > t'. 

Consider now a family of functions (/n)n=i,2,..., where fn : A" — )■ Y. We say that 
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such a family is computable if there exists a family of automata (Ad)d=i.2,... such that 
for any n, for any network (n, G, C), and any set of initial conditions Xi, . . . , x„, the 
resulting algorithm terminates and the final output is . . . , Xn). 

As an exception to the above definitions, we note that although we primarily focus 
on the finite case, we will briefly consider in Section [9.4. II function families (/n)n=i.2,... 
computable with infinite memory, by which we mean that the internal memory sets 
Zd and the output set Y are countably infinite, the rest of the model remaining the 
same. 

The rest of the chapter focuses on the following general question: what fami- 
lies of functions are computable, and how can we design a corresponding algorithm 
(Ad)(i=i,2,...? To illustrate the nature of our model and the type of algorithms that it 
supports, we provide a simple example. 

Detection problem: In this problem, all nodes start with a binary initial value 
Xj G {0, 1} = X. We wish to detect whether at least one node has an initial value equal 
to 1. We are thus dealing with the function family {fn)n=i,2,..., where fn{xi, . . . , x„) = 
max{a;i, . . . ,Xn}. This function family is computable by a family of automata with 
binary messages, binary internal state, and with the following transition rule: 

if Xi = 1 or Zi{t) = 1 or maxj;(jj)g£;mjj(t) = 1 then 
set z^{t + 1) = y^{t + 1) = 1 
send mij{t + 1) = 1 to every neighbor j of i 

else 

set Zi{t + 1) = Viit + 1) = 
send mij{t + 1) = to every neighbor j of i 
end if 

In the above algorithm, we initialize by setting mjj(O), yi{0), and Zi{0) to zero 
instead of the special symbol 0. One can easily verify that ii xt = for every i, then 
yi{t) = for all i and t. If on the other hand Xk = 1 for some k, then at each time step 
t, those nodes i at distance less than t from k will have yi{t) = 1. Thus, for connected 
graphs, the algorithm will terminate within n steps, with the correct output. It is 
important to note, however, that because n is unknown, a node i can never know 
whether its current output yi{t) is the final one. In particular, if yi{t) = 0, node i 
cannot exclude the possibility that Xk = 1 for some node whose distance from i is 
larger than t. 

9.3 Necessary condition for computability 

In this section we establish our main negative result, namely, that if a function family 
is computable, then the final output can only depend on the frequencies of the different 
possible initial values. Furthermore, this remains true even if we allow for infinite 
memory, or restrict to networks in which neighboring nodes share a common label for 
the edges that join them. This result is quite similar to Theorem 3 of [33], and so is the 
proof. Nevertheless, we provide a proof in order to keep the chapter self-contained. 
We first need some definitions. Recall that X = {0, 1, ... , K}. We let D be the 
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unit simplex, that is, D = {{po, . . . ,Pk) ^ [0, 1]'^"''^ : J2k=oPi' ~ -'-}■ ^^^^ ^ 

function h : D ^ Y corresponds to a function family (/n)n=i,2,... if for every n and 
every x G X", we have 

/(xi, . . . ,a;n) = /i (po(a;i, • • . , a;„),pi(a;i, . . . . . . ,Pk{xi, • • .,Xn)) , 

where 

. . . ,Xn) = \{i \ Xi = k}\/n, 

so that . . . ,Xn) is the frequency of occurrence of the initial value k. In this 

case, we say that the family (/„,) is frequency-based. 

Theorem 9.1. Suppose that the family (fn) is computable with infinite memory, 
Then, this family is frequency-based. The result remains true even if we only require 
computability over edge-labeled networks. 

The following are some applications of Theorem 19.11 

(a) The parity function Yl^=i^i (mod k) is not computable, for any k > 1. 

(b) In a binary setting {X = {0,1}), checking whether the number of nodes with 
Xi = 1 is larger than or equal to the number of nodes with Xi = plus 10 is not 
computable. 

(c) Solitude verification, i.e., checking whether \i : {xj = 0}| = 1, is not computable. 

(d) An aggregate difference function such as J2i<j k« ~ is not computable, even 
if it is to be calculated modulo k. 

9.3.1 Proof of Theorem \9A\ 

The proof of Theorem 19.11 involves a particular degree-two network (a ring), in which 
all port numbers take values in the set {0,1,2}, and in which any two edges {i,j) 
and (j, i) have the same port number. The proof proceeds through a sequence of 
intermediate results, starting with the following lemma, which can be easily proved 
by induction on time; its proof is omitted. 

Lemma 9.1. Suppose that G = {{1, . . . ,n}, E) and G' = {{1, . . . ,n}, E') are iso- 
morphic; that is, there exists a permutation it such that {i,j) E E if and only if 
(7r(z), 7r(j)) G E' . Furthermore, suppose that the port label at node i for the edge 
leading to j in G is the same as the port label at node 7i{i) for the edge leading to 7i{j) 
in G' . Then, the state Si{t) resulting from the initial values xi, . . . ,x„ on the graph 
G is the same as the state 5'^(i)(t) resulting from the initial values x^-i(i), . . . , x.„-i(^n) 
on the graph G' . 

Lemma 9.2. Suppose that the family (/n)n=i,2,... is computable with infinite mem- 
ory on edge-labeled networks. Then, each ft is invariant under permutations of its 
arguments. 

Proof. Let nij be the permutation that swaps i with j (leaving the other nodes intact); 
with a slight abuse of notation, we also denote by Hij the mapping from X" to X"' 
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that swaps the ith and jth elements of a vector. (Note that n-j^ = Hij.) We show 
that for all x e X"-, = /„(vrjj(x)). 

We run our distributed algorithm on the n-node complete graph with an edge 
labeling. Note that at least one edge labeling for the complete graph exists: for ex- 
ample, nodes i and j can use port number (i+j) mod n for the edge connecting them.. 
Consider two different sets of initial values, namely the vectors (i) x, and (ii) 7iij{x). 
Let the port labeling in case (i) be arbitrary; in case (ii), let the port labeling be such 
that the conditions in Lemma 19.11 are satisfied (which is easily accomplished) . Since 
the final value is /(x) in case (i) and fijiiji^x)) in case (ii), we obtain /(x) = f{7iij{x)). 
Since the permutations vTjj generate the group of permutations, permutation invari- 
ance follows. □ 

Let X G X"'. We will denote by the concatenation of x with itself, and, generally, 
by x^ the concatenation of k copies of x. We now prove that self-concatenation does 
not affect the value of a computable family of functions. 

Lemma 9.3. Suppose that the family {fn)n=i,2,... is computable with infinite memory 
on edge-labeled networks. Then, for every n > 2, every sequence x G X"', and every 
positive integer m, 

fn{x) fmn{x )• 

Proof. Consider a ring of n nodes, where the ith node clockwise begins with the ith 
element of x; and consider a ring of mn nodes, where the nodes i,i + n,i + 2n, . . . 
(clockwise) begin with the ith element of x. Suppose that the labels in the first ring 
are 0,1,2,1,2,.... That is, the label of the edge (1,2) is and the labels of the 
subsequent edges alternate between 1 and 2. In the second ring, we simply repeat m 
times the labels in the first ring. See Figure 19^ for an example with n = 5, m = 2. 




Figure 9-1: Example of two situations that are algorithmically indistinguishable. The 
numbers next to each edge are the edge labels. 

Initially, the state Si{t) = [xi,yi{t), Zi(t),mi^i(t),mi^2{t)], with t = 0, of node i in 
the first ring is exactly the same as the state of the nodes j = i,i + n,i + 2n, ... in 
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the second ring. We show by induction that this property must hold at all times t. 
(To keep notation simple, we assume, without any real loss of generality, that i ^ 1 
and i 7^ n.) 

Indeed, suppose this property holds up to time t. At time t, node i in the first 
ring receives a message from node i — 1 and a message from node 2 + 1; and in the 
second ring, node j satisfying j (mod n) = i receives one message from j — 1 and 
J ' + 1. Since j — 1 (mod n) = i — 1 (mod n) and j + 1 (mod n) = i + 1 (mod n), the 
states of j — 1 and i — 1 are identical at time t, and similarly for j + 1 and i + 1. 
Thus, because of periodicity of the edge labels, nodes i (in the first ring) and j (in 
the second ring) receive identical messages through identically labeled ports at time 
t. Since i and j were in the same state at time t, they must be in the same state 
at time t + 1. This proves that they are always in the same state. It follows that 
Uiit) = Ujit) for all t, whenever j (mod n) = i, and therefore /„(x) = fmn{x"^)- □ 

Proof of Theorem \9.1i Let x and y be two sequences of n and m elements, respec- 
tively, such that pk{xi, . . . , Xn) and Pkivi, • • • , ym) are equal to a common value pk, 
for fc e X; thus, the number of occurrences of in x and y are npk and mpk, respec- 
tively. Observe that for any /c G X, the vectors and y"' have the same number 
mn of elements, and both contain mnpk occurrences of k. The sequences y^ and 
x'^ can thus be obtained from each other by a permutation, which by Lemma 19.21 
implies that fnm{x"^) = fnmiy^)- From Lemma [973l we have that fnm{x"^) = fn{x) 
and fmniy'') = fm{y)- Therefore, fn{x) = fm{y)- This proves that the value of /„(x) 
is determined by the values of Pk{xi, . . . , Xn), k = 0,1, . . . , K. □ 

9.4 Reduction of generic functions to the compu- 
tation of averages 

In this section, we turn to positive results, aiming at a converse of Theorem 19.11 The 
centerpiece of our development is Theorem 19.21 proved in the previous chapter, which 
states that a certain average-like function is computable. Theorem 19.21 then implies 
the computability of a large class of functions, yielding an approximate converse to 
Theorem 19.11 

The average-like functions that we consider correspond to the "interval consensus" 
problem studied in [TT]. They are defined as follows. Let X = {0, . . . , K}. Let Y be 
the following set of single-point sets and intervals: 

Y = {{0}, (0, 1), {1}, (1,2),..., {K- 1}, {K -1,K), {K}} 

(or equivalently, an indexing of this finite collection of intervals). For any n, let fn 
be the function that maps {xi,X2, ■ ■ ■ ,Xn) to the element of Y which contains the 
average ^iXi/n. We refer to the function family (/n)n=i,2,... as the interval- averaging 
family. 

The motivation for this function family comes from the fact that the exact average 
Ylii^il''^ takes values in a countably infinite set, and cannot be computed when the 
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set Y is finite. In the quantized averaging problem considered in the literature, one 
settles for an approximation of the average. However, such approximations do not 
necessarily define a single-valued function from X"' into Y. In contrast, the above 
defined function /„ is both single- valued and delivers an approximation with an error 
of size at most one. Note also that once the interval- average is computed, we can 
readily determine the value of the average rounded down to an integer. 

Theorem 9.2. The interval-averaging function family is computable. 

The proof of Theorem l9.2l (and the corresponding algorithm) was given in Chapter 
|8] In this section, we show that the computation of a broad class of functions can be 
reduced to interval-averaging. 

Since only frequency-based function families can be computable (Theorem 19. ip . we 
can restrict attention to the corresponding functions h. We will say that a function 
h on the unit simplex D is computable if it it corresponds to a frequency-based 
computable family (/„). The level sets of h are defined as the sets L{y) = {p G 
D I h{p) = y}, for y eY . 

Theorem 9.3 (Sufficient condition for computability). Let h be a function from the 
unit simplex D to Y . Suppose that every level set L{y) can be written as a finite 
union, 

L{y) = \jQ,k, 

k 

where each Ci^k can in turn be written as a finite intersection of linear inequalities of 
the form 

aopo + aipi + a2P2 + ■ ■ ■ + axPK < a, 

or 

aoPo + aipi + a2P2 H h c^kPr < «, 

with rational coefficients a, Oq, ai, . . . , ax- Then, h is computable. 

Proof. Consider one such linear inequality, which we assume, for concreteness, to be 
of the first type. Let P be the set of indices i for which > 0. Since all coefficients 
are rational, we can clear their denominators and rewrite the inequality as 



keP keP'= 



for nonnegative integers Pk and /3. Let Xk be the indicator function associated with 
initial value k, i.e., Xk{i) = 1 ii Xi = k, and Xk{i) = otherwise, so that pk = 
^ZljXfc(0- Then, (EH]) becomes 



or 



i=i VfceP fceP': / fceP'^ 

1 



n 

i=l 
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where = J2keP (^kXk{i) + J2keP- f^kil - Xk{i)) and q* = (3 + J2keP-(^k- 

To determine whether the last inequahty is satisfied, each node can compute qi 
and q*, and then apply a distributed algorithm that computes the integer part of 
n XlILi ^^is is possible by virtue of Theorem I9.2[ with K set to J2k(^k (the largest 
possible value of qt). To check any finite collection of inequalities, the nodes can 
perform the computations for each inequality in parallel. 

To compute h, the nodes simply need to check which set L{y) the frequencies 
Po,Pi, ■ ■ ■ ,Pk lie in, and this can be done by checking the inequalities defining each 
L{y). All of these computations can be accomplished with finite automata: indeed, we 
do nothing more than run finitely many copies of the automata provided by Theorem 
I9.2[ one for each inequality. The total memory used by the automata depends on 
the number of sets Ci^k and the magnitude of the coefficients but not on n, as 
required. □ 

Theorem 19.31 shows the computability of functions h whose level-sets can be de- 
fined by linear inequalities with rational coefficients. On the other hand, it is clear 
that not every function h can be computable. (This can be shown by a counting 
argument: there are uncountably many possible functions h on the rational elements 
of D, but for the special case of bounded degree graphs, only countably many possi- 
ble algorithms.) Still, the next result shows that the set of computable functions is 
rich enough, in the sense that computable functions can approximate any measurable 
function, everywhere except possibly on a low- volume set. 

We will call a set of the form Y[k=o(^k,bk), with every afc,6fc rational, a rational 
open box, where Yl stands for Cartesian product. A function that can be written 
as a finite sum Yli^^i^Bi, where the Bi are rational open boxes and the 1^^ are the 
associated indicator functions, will be referred to as a box function. Note that box 
functions are computable by Theorem 19.31 

Corollary 9.4. // every level set of a function h : D ^ Y on the unit simplex D 
is Lebesgue measurable, then, for every e > 0, there exists a computable box function 
: D ^ Y such that the set {p & D \ h{p) ^ h^{p)} has measure at most e. 

Proof. (Outline) The proof relies on the following elementary result from measure 
theory. Given a Lebesgue measurable set i?C D and some e > 0, there exists a set 
E' which is a finite union of disjoint open boxes, and which satisfies 

l^{EAE') < e, 

where fi is the Lebesgue measure. By a routine argument, these boxes can be taken 
to be rational. By applying this fact to the level sets of the function h (assumed 
measurable), the function h can be approximated by a box function h^. Since box 
functions are computable, the result follows. □ 

The following corollary states that continuous functions are approximable. 

Corollary 9.5. // a function h : D [L,U] ^ ^ is continuous, then for every 
e > there exists a computable function : [L, U] such that \\h — h^\\oo < e 
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Proof. Since D is compact, h is uniformly continuous. One can therefore partition 
D into a finite number of subsets, Ai, A2, . . . , Ag, that can be described by hnear 
inequahties with rational coefficients, so that maXp^A. h{p) — min^g^^ h{p) < e holds 
for all Aj. The function is then built by assigning to each Aj an appropriate value 
in {L,L + e,L + 2e, ...,[/}. □ 

To illustrate these results, let us consider again some examples. 

(a) Majority voting between two options is equivalent to checking whether pi < 1/2, 
with alphabet {0, 1}, and is therefore computable. 

(b) Majority voting when some nodes can "abstain" amounts to checking whether 
Pi — Po > 0, with input set X ={0, 1, abstain}. This function family is com- 
putable. 

(c) We can ask for the second most popular value out of four, for example. In this 
case, the sets Ai can be decomposed into constituent sets defined by inequalities 
such as < Ps < P4 < Pij each of which obviously has rational coefficients. 

(d) For any subsets 1,1' of {0,1,...,/'^}, the indicator function of the set where 
Ylii^iPi^^a^rPi is computable. This is equivalent to checking whether more 
nodes have a value in / than do in /'. 

(e) The indicator functions of the sets defined hy p\ < 1/2 and pi < 7r/4 are 
measurable, so they are approximable. We are unable to say whether they are 
computable. 

(f) The indicator function of the set defined by piP2 < 1/8 is approximable, but we 
are unable to say whether it is computable. 

9.4.1 Computability with infinite memory 

Finally, we show that with infinite memory, it is possible to recover the exact frequen- 
cies pfc. (Note that this is impossible with finite memory, because n is unbounded, 
and the number of bits needed to represent pk is also unbounded.) The main difficulty 
is that Pk is a rational number whose denominator can be arbitrarily large, depending 
on the unknown value of n. The idea is to run separate algorithms for each possible 
value of the denominator (which is possible with infinite memory) , and reconcile their 
results. 

Theorem 9.6. The vector {po,pi, . . . ,Pk) is computable with infinite memory. 

Proof. We show that pi is computable exactly, which is sufficient to prove the theorem. 
Consider the following algorithm, to be referred to as Qm, parametrized by a positive 
integer m. The input set Xm is {0, 1, ... , m} and the output set Ym is the same as 
in the interval-averaging problem: Ym = {{0}, (0, 1), {1}, (1, 2), {2}, (2, 3), ... , {m — 
1}, (m — l,m), {m}}. If Xi = 1, then node sets its initial value Xi^m to m; else, the 
node sets its initial value Xi^m to 0. The algorithm computes the function family (/„) 
which maps X^ to the element of Ym containing (1/n) Yl^=i^i,m, which is possible, 
by Theorem 19.21 

The nodes run the algorithms Qm for every positive integer value of m, in an 
interleaved manner. Namely, at each time step, a node runs one step of a particular 
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algorithm Q^, according to the following order: 



Qii Q11Q21 Q11Q21Q31 Q11Q21Q31Q4: QiiQ2)--- 



At each time t, let mj(t) be the smallest m (if it exists) such that the output 
yi,m{t) of Qm at node i is a singleton (not an interval). We identify this singleton 
with the numerical value of its single element, and we set yi(t) = yi,m,{t)it)/mi(t). If 
mi{t) is undefined, then yi{t) is set to some default value, e.g., 0. 

Let us fix a value of n. For any m < n, the definition of Qm and Theorem 19.21 
imply that there exists a time after which the outputs Hi^m of Qm do not change, and 
are equal to a common value, denoted i/m, for every i. Moreover, at least one of the 
algorithms Qi, . . . ,Qn has an integer output Um- Indeed, observe that Q^ computes 
(1/n) ^"^-j^ nl^— 1 = ^"=1 la;i=i, which is clearly an integer. In particular, mi(t) is 
eventually well-defined and bounded above by n. We conclude that there exists a time 
after which the output yi{t) of our overall algorithm is fixed, shared by all nodes, and 
different from the default value 0. 

We now argue that this value is indeed pi. Let m* be the smallest m for which 
the eventual output of Qm is a single integer i/m- Note that Um* is the exact average 
of the Xi^m*, i-e.. 



For large t, we have mi{t) = m* and therefore yi{t) = yi,m*{t)/Tn* = pi, as desired. 

Finally, it remains to argue that the algorithm described here can be implemented 
with a sequence of infinite memory automata. All the above algorithm does is run a 
copy of all the automata implementing Qi,Q2, - ■ ■ with time-dependent transitions. 
This can be accomplished with an automaton whose state space is the countable set 
Af X Uj^^i YliLi Qiy where Qi is the state space of Qi, and the set JV of integers is 
used to keep track of time. □ 

9.5 Conclusions 

We have proposed a model of deterministic anonymous distributed computation, in- 
spired by the wireless sensor network and multi-agent control literature. We have 
given an almost tight characterization of the functions that are computable in our 
model. We have shown that computable functions must depend only on the the 
frequencies with which the different initial conditions appear, and that if this depen- 
dence can be expressed in term of linear inequalities with rational coefficients, the 
function is indeed computable. Under weaker conditions, the function can be approx- 
imated with arbitrary precision. It remains open to exactly characterize the class of 
computable function families. 

Our positive results are proved constructively, by providing a generic algorithm 
for computing the desired functions. Interestingly, the finite memory requirement 
is not used in our negative results, which remain thus valid in the infinite memory 
case. In particular, we have no examples of functions that can be computed with 




n 
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infinite memory but are provably not computable with finite memory. We suspect 
though that simple examples exist; a good candidate could be the indicator function 
lpi<i/7r; which checks whether the fraction of nodes with a particular initial condition 
is smaller than I/tt. 



Ill 
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Chapter 10 

Concluding remarks and a list of 
open problems 

This thesis investigated several aspects of the convergence times of averaging algo- 
rithms and the effects of quantized communication and storage on performance. Our 
goal has been to try to understand some aspects of the tradeoffs between robustness, 
storage, and convergence speed in distributed systems. 
The main results of this thesis are: 

1. An O ((?T,^/?7)i?log(l/e)) upper bound on the convergence time of products of 
doubly stochastic matrices (which lead to a class of averaging algorithms) in 
Chapter |H This is the first polynomial upper bound on the convergence time 
of averaging algorithms. 

2. An 0{n^B\ogl/ e) averaging algorithm in Chapter |5l This is currently the 
averaging algorithm with the best theoretical guarantees. 

3. An r2(n^) lower bound in Chapter [6] on the convergence of any distributed 
averaging algorithm that uses a single scalar state variable at each agent and 
satisfies a natural "smoothness" condition. 

4. The finding of Chapter [7] that storing and transmitting clogn bits can lead to 
arbitrarily accurate computation of the average by simply quantizing any linear, 
doubly stochastic averaging scheme. 

5. The deterministic algorithm in Chapter [8] which, given initial values of or 1 
at each node, can compute which value has the majority with only a constant 
number of bits stored per link at each node. 

6. The computability and non-computability results of Chapter |9l which tell us 
that with deterministic anonymous algorithms and a constant number of bits 
per link at each node we can compute only functions depending on proportions; 
and that all the "nice" functions of proportions are computable. 

Many questions remain unanswered. We begin by listing several that are central 
to understanding the fundamentals of network information aggregation questions. 
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1. Is it possible to come up with local averaging algorithms which are robust to 
link failures, smoothly update a collection of real numbers of fixed size, and 
average faster than 0(n^i? log 1/e) on 5-connected graph sequences? 

2. What if in addition to the above requirements we also ask that the convergence 
time be on the order of the diameter for a (time- invariant) geometric random 
graph? 

3. What if in addition to the above requirements we also ask that the convergence 
time be on the order of the diameter for any fixed graph? 

4. Suppose every node starts out with a or 1. Is it possible to deterministi- 
cally compute which node has the majority initially if the graph sequence G{t) 
changes unpredictably? However, we do insist that each G(t) be undirected, 
and we require that each node store only a constant number of bits per each 
link it maintains. Naturally, some additional connectivity assumptions on G{t) 
will have to be imposed for any positive result. 

5. More generally, characterize exactly which functions of binary initial values can 
be computed with a deterministic algorithm which maintains a constant number 
of bits per link at each node. The graph sequence G{t) is undirected, but may 
be either fixed or change unpredictably. 

We next list some other open questions motivated by the problems considered 
here. 

1. For which classes of algorithms can an lower bound on convergence time 
be proven? In particular, is it possible to replace the assumption of Chapter E] 
that the update function be differentiable by the weaker assumption that the 
update function be piecewise differentiable? What if we add some memory to 
the algorithm? 

2. Here is a concrete instance of the previous question. Is it true that any doubly 
stochastic matric "on the ring" mixes as f2(n^)? That is, let P be a doubly 
stochastic matrix such that Pij = ii \i — j\ mod n > 1. Is it true that 
maxA(A)^i > 1 — c/n^ for some constant c? 

Major progress on this question was recently made in 

3. Where is the dividing line in Chapter H] between polyynomial and exponential 
convergence time? In particular, how far may we relax the double stochasticity 
Assumption 13.41 and still maintain polynomial convergence time? For example, 
does polynomial time convergence still hold if we replace Assumption 13.41 with 
the requirement that the matrices A{t) be (row) stochastic and each column 
sum is in [1 — e, 1 + e] for some small e? 
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4. Let be an irreducible, aperiodic stochastic matrix in R^^^. Let vr^ be its 
stationary distribution. Can we identify interesting classes of sequences 



which satisfy limfc_i.oo T^k{i) = for any z? In words, we are asking for each agent 
to have a negligible influence on the final result in the limit. This is sometimes 
a useful property in estimation problems; see 

5. Is there a decentralized way to pick a symmetric {atj = aji), stochastic linear 
update rule which minimizes 



This corresponds to handling the effect of white noise disturbances optimally; 
see |100] for a centralized algorithm based on convex optimization. 

6. For which classes of correlated random variables can we design decentralized 
algorithms for maximum likelihood estimation as done in Chapter [2p 

7. Suppose every node starts out with a or L The communication graph is 
undirected, fixed, but unknown to the nodes. Each node can maintain a fixed 
number of bits per each link it maintains. It is possible to (deterministically) 
decide whether (l/n) J2i=o^i l/"\/2? What about (l/n) Yl^=o^i > l/vr? 

8. Is it possible to improve the running time of the interval- averaging algorithm 
of Chapter [8] to 0{n'^K log K) communication rounds? 

9. Given a connected graph G = {{1, . . . ,n}, E), suppose that every link e is 
an erasure channel with erasure probability p^. That is, every node can send 
a message on each link at each time step, but that the message is lost with 
probability pg. Nodes do not know whether their messages are lost. What is 
the time complexity of average or sum computation in such a setting? How 
does it relate to various known graph connectivity measures? 

10. How should one design averaging algorithms which send as few messages as pos- 
sible? We ask that these algorithms work on arbitrary i?-connected undirected 
graph sequences. 
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Appendix A 



On quadratic Lyapunov functions 
for averaging 

This appendix focuses on a technical issue appearing in Chapter [31 its content has 
previous appeared in the M.S. thesis [77] and the paper [75] . 

Specifically, in chapter [3l a number of theorems on the convergence of the process 

x{t + l) = A{t)x{t), 

were proved, e.g. Theorems 13. 11 [3^ 13.31 \3M These theorems were proved by showing 
that the "span norm" maXjXi(t) — m.iniXi{t) is guaranteed to decrease after a cer- 
tain number of iterations. Unfortunately, this proof method usually gives an overly 
conservative bound on the convergence time of the algorithm. Tighter bounds on 
the convergence time would have to rely on alternative Lyapunov functions, such as 
quadratic ones, of the form x'^Mx, if they exist. 

Moreover, in Chapter |U we developed bounds for convergence of averaging algo- 
rithms based on quadratic Lyapunov functions. Thus we are led to the question: is 
it possible to find quadratic Lyapunov functions for the non-averaging convergence 
theorems of Chapter [3p A positive answer might lead to improved convergence times. 

Although quadratic Lyapunov functions can always be found for linear systems, 
they may fail to exist when the system is allowed to switch between a fixed number 
of linear modes. On the other hand, there are classes of such switched linear systems 
that do admit quadratic Lyapunov functions. See [61] for a broad overview of the 
literature on this subject. 

The simplest version of this question deals with the symmetric, equal-neighbor 
model and was investigated in [52]. The authors write: 

"...no such common Lyapunov matrix M exists. While we have not been 
able to construct a simple analytical example which demonstrates this, 
we have been able to determine, for example, that no common quadratic 
Lyapunov function exists for the class of all [graphs which have] 10 vertices 
and are connected. One can verify that this is so by using semidefinite 
programming..." 
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The aim of this appendix is to provide an analytical example that proves this fact. 

A.l The Example 

Let us fix a positive integer n. We start by defining a class Q of functions with some 
minimal desired properties of quadratic Lyapunov functions. Let 1 be the vector in 
3?" with all components equal to 1. A square matrix is said to be stochastic if it is 
nonnegative and the sum of the entries in each row is equal to one. Let A C 3?"^" be 
the set of stochastic matrices A such that: (i) an > 0, for all i; (ii) all positive entries 
on any given row of A are equal; (iii) aij > if and only if aji > 0; (iv) the graph 
associated with the set of edges {(i, j) | a^j > 0} is connected. These are precisely 
the matrices that correspond to a single iteration of the equal-neighbor algorithm on 
symmetric, connected graphs. 

Definition A.l. A function Q : 3ft" — > 3? belongs to the class Q if it is of the form 
Q{x) = Mx, where: 

(a) The matrix M G 3ft"^" is nonzero, symmetric, and nonnegative definite. 

(h) For every A G A, and x G 3?*^, we have Q{Ax) < Q{x). 

(c) We have Q{1) = 0. 

Note that condition (b) may be rewritten in matrix form as 

x^A^MAx < x^Mx, for all A e A, and x G 3?''. (A.l) 

The rationale behind condition (c) is as follows. Let S be the subspace spanned by 
the vector 1. Since we are interested in convergence to the set S, and every element 
of is a fixed point of the algorithm, it is natural to require that Q{1) = 0, or, 
equivalently, 

Ml = 0. 

Of course, for a Lyapunov function to be useful, additional properties would be desir- 
able. For example we should require some additional condition that guarantees that 
Q{x{t)) eventually decreases. However, according to Theorem IA.21 even the minimal 
requirements in Definition lA.ll are sufficient to preclude the existence of a quadratic 
Lyapunov function. 

Theorem A. 2. Suppose that n > 8. Then, the class Q (cf. Definition \A.l\) is empty. 

The idea of the proof is as follows. Using the fact the dynamics of the system 
are essentially the same when we rename the components, we show that if x'^Mx has 
the desired properties, so does x'^Zx for a matrix Z that has certain permutation- 
invariance properties. This leads us to the conclusion that there is essentially a single 
candidate Lyapunov function, for which a counterexample is easy to develop. 

Recall that a permutation of n elements is a bijective mapping a : {1, . . . ,n} — 
{1, . . . , n}. Let n be the set of all permutations of n elements. For any cr G 11, we 
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define a corresponding permutation matrix P^r by letting the ith component of P^x be 
equal to Xo-(i). Note that P~^ = Pj, for all a G 11. Let V be the set of all permutation 
matrices corresponding to permutations in 11. 

Lemma A.l. Let M E Q. Define Z as 

Z =Y^ P'^MP. 
PeV 

Then, Z e Q. 

Proof: For every matrix A E A, and any P G P, it is easily seen that PAP^ G A. 
This is because the transformation A i— )■ PAP^ amounts to permuting the rows and 
columns of A, which is the same as permuting (renaming) the nodes of the graph. 

We claim that if M G Q and P G P, then P^MP G Q. Indeed, if M IS nonzero, 
symmetric, and nonnegative definite, so is P^MP. Furthermore, since PI = 1, if 
Ml = 0, then P'^MPl = 0. To establish condition (b) in Definition let us 
introduce the notation Qp{x) = {P'^ MP)x. Fix a vector x G 3?"", and A E A] 
define B = PAP^ G A. We have 

Qp{Ax) = 
< 



where the inequality follows by applying Eq. ( 1A.1I) . which is satisfied by M, to the 
vector Px and the matrix B. We conclude that Qp E Q. 

Since the sum of matrices in Q remains in Q, it follows that Z = Y2p&p P^ MP 
belongs to Q. ■ 

We define the "sample variance" V{x) of the values xi, . . . , a;„, by 

n 

V{x) = Y,{^,-x)\ 

i=l 

where x = {l/n) "^1^=1 ^i- This is a nonnegative quadratic function oix, and therefore, 
V{x) = x^Cx, for a suitable nonnegative definite, nonzero symmetric matrix C G 

Lemma A. 2. There exists some a > such that 

x'^Zx = aV{x), for all x G 3?". 
Proof: We observe that the matrix Z satisfies 

R^ZR = Z, for all Re P. (A.2) 



x^A^P^MPAx 
x'^P^PA'^P^MPAP^Px 
rJpT^'^MBPx 
x'^P^MPx 
Qpix), 
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To see this, fix R and notice that the mapping P i— )■ PR is a bijection of V onto itself, 
and therefore. 



R^ZR = ^{PRfM{PR) = P^MP = Z. 



p&v Per 



We will now show that condition flA.2l) determines Z, up to a multiplicative factor. 



Let Zij be the (z, j)th entry of Z. Let 1*^'^ be the ith. unit vector, so that iW^^'l'^*^ 



Zii. Let P E P be a permutation matrix that satishes Pl^ ' = 1^-". ihen, Zu = 
l(0^^l(i) = l(i)^pTzPl^') = lO)^Zl(j) = Zjj. Therefore, all diagonal entries of Z 



have a common value, to be denoted by z. 

Let us now fix three distinct indices k, and let y = 1^*^ + l^-'-*, w = 1^*^ + 1*^'^-*. 
Let P G P be a permutation matrix such that Pl^ = and Pl^^'^ = so that 
Py = w. We have 

2z + 2zij = y'^Zy = y'^P^ZPy = w'^Zw = 2z + 2zik. 

By repeating this argument for different choices of z, j, /c, it follows that all off-diagonal 
entries of Z have a common value to be denoted by r. Using also the property that 
Zl = 0, we obtain that z + {n — l)r = 0. This shows that the matrix Z is uniquely 
determined, up to a multiplicative factor. 

We now observe that permuting the components of a vector x does not change 
the value of V{x). Therefore, V{x) = V{Px) for every P G P, which implies that 
x^P^CPx = x^Cx, for all P G P and x G 3?". Thus, C satisfies (jA3). Since 
all matrices that satisfy flA.2p are scalar multiples of each other, the desired result 
follows. ■ 

Proof of Theorem IA.2t In view of Lemmas lA.ll and IA.21 if Q is nonempty, then 
V E Q. Thus, it suffices to show that V ^ Q. Suppose that n > 8, and consider 
the vector x with components Xi = 5, X2 = = X4 = 2, X5 = Xq = x-j = —3, 
x^ = —5, and xg = ■ ■ ■ = Xn = 0. We then have V{x) = 80. Consider the outcome of 
one iteration of the symmetric, equal-neighbor algorithm, if the graph has the form 
shown in Figure IA-1[ After the iteration, we obtain the vector y with components 

yi = 11/5,^/2 = ^3 = 2/4 = 7/2,^/5 = 0,2/6 = ^7 = -4,^/8 = -11/4, and 2/9 = ■■■ = 
yn = 0. We have 

n 
i=l 



i=l 

1=1 i=l 

where we used that Yl^i=iiyi — minimized when z = (1/^) XliLi ^i- A simple 
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Figure A-1: A connected graph on n nodes showing that V{x) is not a Lyapunov 
function when n > 8. All arcs of the form (z, i) are also assumed to be present, but 
are not shown. The nodes perform an iteration of the symmetric, equal-neighbor 
model according to this graph. 



calculation shows that the expression ( ]A.3|) evaluates to 10246/127 ~ 80.68, which 
implies that V{y) > V{x). Thus, if n > 8, ^ Q, and the set Q is empty. ■ 



A. 2 Conditions for the Existence of a Quadratic 
Lyapunov Function 

Are there some additional conditions (e.g., restricting the matrices A to a set smaller 
than A), under which a quadratic Lyapunov function is guaranteed to exist? We 
start by showing that the answer is positive for the case of a fixed matrix (that is, if 
the graph G{t) is the same for all t). 

Let A be a stochastic matrix, and suppose that there exists a positive vector tt 
such that n'^A = vr^. Without loss of generality, we can assume that ir'^l = 1. It is 
known that in this case, 

x'^A^DAx < x'^Dx, V X e 3?", (A.4) 

where D is a diagonal matrix, whose ith diagonal entry is equal to vrj (cf. Lemma 6.4 
in [15]). However, x^Dx cannot be used as a Lyapunov function because Dl (cf. 
condition (c) in Definition lA.ll) . To remedy this, we argue as in [17] and define the 
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matrix H = I — Itt^, and consider the choice M = H^DH. Note that M has rank 
n — 1. 

We have HI = (/ — Itt^)! = 1 — l(7r-^l) = 1 — 1 = 0, as desired. Furthermore, 

HA = A- In^A = A-ln'^ = A- Al-n^ = AH. 

Using this property, we obtain, for every x G 3?", 

x'^A^MAx = x'^A^H^DHAx = {x'^ H^)A^ DA{Hx) < x'^H^DHx = x'^Mx, 

where the inequahty was obtained from (lA.4p . apphed to Hx. This shows that H^DH 
has the desired properties (a)-(c) of Definition lA.ll provided that A is replaced with 
{A}. 

We have just shown that every stochastic matrix (with a positive left eigenvector 
associated to the eigenvalue 1) is guaranteed to admit a quadratic Lyapunov function, 
in the sense of Definition lA.ll Moreover, our discussion imphes that there are some 
classes of stochastic matrices B for which the same Lyapunov function can be used 
for all matrices in the class. 

(a) Let i3 be a set of stochastic matrices. Suppose that there exists a positive vector 
vr such that ir'^l = 1, and ir^A = ir'^ for all A ^ B. Then, there exists a nonzero, 
symmetric, nonnegative definite matrix M, of rank n — 1, such that Ml = 0, 
and x'^A^MAx < x'^Mx, for all x and A E B. 

(b) The condition in (a) above is automatically true if all the matrices in B are 
doubly stochastic (recall that a matrix A is doubly stochastic if both A and 
are stochastic); in that case, we can take tt = 1. 

(c) The condition in (a) above holds if and only if there exists a positive vector 
TT, such that tt'^Ax = vr^x, for all A E B and all x. In words, there must be 
a positive linear functional of the agents' opinions which is conserved at each 
iteration. For the case of doubly stochastic matrices, this linear functional is 
any positive multiple of the sum Yl^=i of the agents' values (e.g., the average 
of these values) . 
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Appendix B 

Averaging with deadlock avoidance 



In this appendix, we describe a solution of a certain averaging problem communicated 
to the author and J.N. Tsitsiklis by A.S. Morse. This is the problem of averaging 
with "deadlock avoidance," namely averaging with the requirement that each node 
participate in at most one pairwise average at every time step. 

More concretely, we will describe a deterministic distributed algorithm for nodes 
1, . . . , n, with starting values Xi(0), . . . , x„(0), to compute the average (1/n) X]r=i ^j(O) 
Our algorithm selects pairs of nodes to average at each time step. We will allow the 
communication graphs G{t) = {{1, . . . ,n}, E{t)) to change with time, but we will 
assume that each graph G(t) is undirected. Moreover, our algorithm will involve 
three rounds of message exchanges between nodes and their neighbors; we assume 
these messages can be sent before the graph G{t) changes. We do not assume nodes 
have unique identifers; however, we do assume each node has a "port labeling" which 
allows it to tell neighbors apart, that is, if node i has d{i) neighbors in the graph 
G{t), it assigns them labels 1, . . . , d{i) in some arbitrary way. 

B.l Algorithm description 

We will partition every time step into several synchronous "periods." At the end of 
each period, nodes send each other messages, which all of them synchronously read at 
the beginning of the following period. During the first period, nodes exchange their 
values with their neighbors; after that, nodes will send messages from the binary 
alphabet {+,—}• Eventually, as a consequence of these messages, nodes will "pair 
up" and each node will average its value with that of a selected neighbor. 

Intuitively, nodes will send "+"s to neighbors they would like to average with, and 
"— "s to decline averaging requests. Their goal will be to average with the neighbor 
whose value is the farthest from their own. 

The following is a description of the actions undertaken by node i at time t. At 
this point the nodes have values xi{t), . . . , Xn{t). Node i will keep track of a variable 
N{i), representing the node it would like to average with; and a variable Qi), which 
will always be equal to \xi{t) — X]y(i)(t)\. 

We find it convenient to label the nodes 1, . . . ,n to make statements like "node 
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i sets N{i) = j." Of course, due to the absence of unique identifiers, node i will set 
its local variable N{i) to its port number for its neighbor j, and the above statement 
should be understood as such. 
The algorithm (at node i, time t): 

1. Initialize N{i) = i and Qi = 0. Node i broadcasts its value to all its neighbors. 

2. Node i reads the incoming messages. Next, node i sends a "+" to a neighbor 
k with the smallest value among neighbors with value smaller than Xi{t). Ties 
can be broken arbitrariljQ. It sets A^(^) = k and gi = \xi{t) — Xk{t)\. 

If no such k exists, node i does nothing during this period. 

3. ode i reads incoming messages. If node i received at least one "+" at the 
previous step, it will compute the gap \xi(t) — Xj(t)\ for every neighbor j in the 
set J of neighbors that have sent it a "+." 



• If the node m E J with the largeslo gap has \xm{t) — Xi{t)\ > Qi, i will 
update N{i) = m, = \xm(t) — Xi(t); next, i will send a "+" to m and 
"— " to all the other nodes in J. Moreover, if i sent a "+" to a node k in 
step 2, it now sends k a "— ." 

• On the other hand, if \xm(t) — Xi(t)\ < Qi, node i will send a "— " to 
everyone in J. 

4. Node i reads incoming messages. If i receives a "— " from node N{i), it sets 
N{{} =i,g^ = 0. 

5. Finally, i sets 

x.{t + l)^^-^^l±^^. 

Observe that if N{i) = i at the execution of step 5, then the value of node i is 
left unchanged. 



B.2 Sketch of proof 

It is not hard to see that this algorithm results in convergence to the average which 
is geometric with rate 1 — c/n^, for some constant c. To put it another way, if initial 
values Xi{t) are in [0, 1] then everyone is within e of the average in 0(n^ log(ne)) time. 
An informal outline of the proof of this statement follows. 

Sketch of proof. 

First, we informally describe the main idea. First, one needs to argue that the 
algorithm we just described results a collection of pairwise averages, i.e. if node i 

^...but dctcrministically to stay within our assumption of deterministic algorithms. For example, 
node i can break ties in favor of lower port number. 

^Ties can be broken arbitrarily, but deterministically as before. 
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sets Xi(t + 1) = {xi{t) + Xj(t))/2 in the final step, then node j sets Xj(t + 1) = 
{xj{t) +Xi{t))/2. Moreover, we will argue that at least one of these pairwise averages 
occurs across an edge with a "large" gap; more precisely, we will argue that among 
the edges {i,j) maximizing \xi{t) — Xj{t)\, at least one pair i,j match up. This fact 
implies, after some simple analysis, that the "sample variance" defined as Y17=ii^ii^)~ 
(1/n) shrinks by at least 1 — l/2n^ from time t to t + 1. The desired 

convergence result then follows. 

1. At the beginning of Step 5, if N{i) = j then N{j) = i. 

In words, the last step always results in the execution of a number of pairwise 
averages for disjoint pairs. This can be proven with a case-by-case analysis. 

2. Moreover, one of these averages has to happen on an edge with the largest "gap" 
max(jj)g£;(f) \xiit) —Xj{t)\, where E{t) is the set of edges in the communication 
graph at time t. 

Indeed, let J be the set of nodes incident on one of these maximizing edges, 
and let J' be the set of nodes in J with the smallest value. Then, at least one 
node j' G J' will receive an offer at Step 2 that comes along a maximizing edge. 
Any offer j' makes at Step 2 will be along an edge with strictly smaller gap. 
Consequently, at step 3, f will send a "-I-" to a sender of an offer along the 
maximizing edge — let us call this sender i' — and a "— " to everyone else that 
sent j' offers, as well as to any node which made f an offer in Step 2. 

Since i' and j' are connected by a maximizing edge, there is no way i' receives a 
"-I-" in Step 2 associated with a gap larger than \xi\t) — Xji{t)\, so that i' does 
not send j' a "— " at step 3. Finally, i' and j' average at Step 5. 

3. Without loss of generality, we will assume that ~ 0- Since every 
averaging operation preserves the sum, it follows that ^jXi(t) = 0. Let V{t) = 

x'fit). Its easy to see that V{t) is nonincreasing, and moreover, an averaging 
operation of nodes a and b at time t reduces V by (1/2) (xa — x^)^. 

4. Let U{t) = maxj|xj(t)|. Clearly, V{t) < nU'^. However, ^iXi{t) = implies 
that at least one Xi{t) is negative, so that 

max \xi(t) — Xj(t)\ > — , 

ii,j)&E n 

which implies that 

nt + l)<V(t)-^, 

or 

V{t+l)<V{t)-^^V{t). 

It follows that after O(n^) steps, V{t) shrinks by a constant factor. Thus if 
the initial values Xi{t) are in [0, 1] then everyone is within e of the average in 
0{in?log{ne)) time. 
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Appendix C 



List of assumptions 

Assumption 2.1 (connectivity) The graph 
is connected for every t. 

Assumption 3.1 (non- vanishing weights) The matrix A{t) is nonnegative, stochastic, 
and has positive diagonal. Moreover, there exists some 77 > such that if a^j > 
then ttij > Tj. 

Assumption 3.2 (5-connectivity) There exists an integer 5 > such that the 
directed graph 

(A^, E{kB) U E{{k + l)B) U ■ ■ ■ U E{{k + l)B - 1)) 
is strongly connected for all integer > 0. 

Assumption 3.3 (bounded delays) 

(a) If aijit) = 0, then rj(t) = t. 

(b) limj^oo r-(t) = 00, for all i, j. 

(c) riit) = t, for all i, t. 

(d) There exists some B > such that t — B + 1 < Tj(t) < t, for all i, j, t. 

Assumption 3.4 (double stochasticity) matrix A{t) is column-stochastic for all t, 
i.e., 

n 

^aij{t) = 1, 
1=1 

for al j and t. 
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Assumption 3.5 (Bounded round-trip times) There exists some B > such that 
whenever G E{t), then there exists some r that satisfies |t — t| < B and 

Assumption 4.1 (connectivity relaxation) Given an integer t > 0, suppose that the 
components of x(tB) have been reordered so that they are in nonincreasing order. We 
assume that for every (i G {1, . . . , n — 1}, we either have Xd{tB) = Xd+i{tB), or there 
exist some time t G {tB, . . . ,(t + 1)B — 1} and some i G {1, . . . , d}, j G {d + 1, . . . ,n} 
such that or belongs to £{A{t)). 

Assumption 7.1 For all i, Xi{0) is a multiple of 1/Q. 



128 



Bibliography 



[1] D. Acemoglu, A. Ozdaglar, A. ParandehGheibi, "Spread of (Mis)information in 
Social Networks," LIDS report 2812, to appear in Games and Economic Behavior, 
2009. 

[2] D. Angeli, P.-A. Bliman, "Stability of leaderless multi-agent systems. Extension 
of a result by Moreau," Mathematics of Control, Signals, and Systems, vol. 18, 
no. 4, pp. 293-322, 2006. 

[3] D. Angluin, "Local and global properties in networks of processors," Proceedings 
of the Twelfth Annual ACM Symposium on the Theory of Computing, Los Angeles, 
USA, Apr 1980. 

[4] Y. Afek, Y. Matias, "Elections in Anonymous Networks," Information and Com- 
putation, vol. 113, No. 2, pp. 312-330, 1994. 

[5] P. Alriksson, A. Rantzer, "Distributed Kalman Filtering Using Weighted Averag- 
ing," Proceedings of the 17th International Symposium on Mathematical Theory 
of Networks and Systems, Kyoto, Japan, Jul 2006. 

[6] J. Aspnes, E. Ruppert, "An introduction to population protocols," Bulletin of the 
European Association for Theoretical Computer Science, vol. 93, pp. 98-117, 2007. 

[7] H. Attiya, M. Snir, M.K. Warmuth, "Computing on an anonymous ring," Journal 
of the ACM, vol. 35, no. 4, pp. 845-875, 1988. 

[8] T.C Aysal, M. Coates, M. Rabbat, "Distributed average consensus using proba- 
bilistic quantization," 14th lEEE/SP Workshop on Statistical Signal Processing, 
Madison, USA, Aug 2007. 

[9] P. Baroah, "Estimation and Control with Relative Measurements: Algorithms and 
Scaling Laws," Ph.D. Thesis, University of California at Santa Barbara, 2007. 

[10] F. Benezit, A. G. Dimakis, P. Thiran and M. Vetterli, "Order-Optimal Consensus 
through Randomized Path Averaging," Proceedings of Forty-Fifth Annual Aller- 
ton Conference on Communication, Control, and Computing, Monticello, USA, 
Sep. 2007. 



129 



[11] F. Benezit, P. Thiran, M. Vetterli, "Interval consensus: from quantized gossip 
to voting," Proceedings of the International Conference on Acoustics, Speech, and 
Signal Processing, Las Vegas, USA, Mar 2008. 

[12] D. P. Bertsekas and R. G. Gallager, Data Networks, Prentice Hall, 2nd edition, 
1991. 

[13] D. P. Bertsekas and J. N. Tsitsiklis, "Comment on 'Coordination of Groups of 
Mobile Autonomous Agents Using Nearest Neighbor Rules," IEEE Transactions 
on Automatic Control, Vol. 52, No. 5, May 2007, pp. 968-969. 

[14] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Nu- 
merical Methods. Englewood Cliffs, NJ: Prentice-Hall, 1989 [Online]. Available: 
littp://hdl.handle.net/1721. 1/3719 

[15] D. P. Bertsekas, and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Sci- 
entific, 1996. 

[16] P.A. Bliman and G. Ferrari- Trecate, "Average consensus problems in networks 
of agents with delayed communications," Proceedings of the Joint 44th IEEE Con- 
ference on Decision and Control and European Control Conference, Seville, Spain, 
Dec 2005. 

[17] V.D. Blondel, J.M. Hendrickx, A. Olshevsky, J.N. Tsitsiklis, "Convergence in 
Multiagent Coordination, Consensus, and Flocking," Proceedings of CDC 05, the 
44th IEEE Conference on Decision and Control, Seville, Spain, Dec 2005. 

[18] S. Boyd, P. Diaconis, J. Sun, and L. Xiao, "Fastest mixing Markov chain on a 
path," The American Mathematical Monthly, vol. 113, no. 1, pp. 70-74, Jan 2006. 

[19] S. Boyd, P. Diaconis, P.A. Parrilo and L. Xiao, "Fastest mixing Markov chain 
on graphs with symmetries, " SIAM Journal on Optimization, vol. 20, no. 2, pp. 
792-819, 2009 

[20] S. P. Boyd, A. Ghosh, B. Prabhakar, D. Shah, "Gossip Algorithms: Design, Anal- 
ysis and Applications," Proceedings of Twenty-fourth IEEE International Confer- 
ence on Computer Communications, Miami, USA, Mar 2005. 

[21] R. Carli, F. Bullo, S. Zampieri, "Quantized Average Consensus via Dynamic 
Coding/Decoding Schemes," International Journal of Robust and Nonlinear Con- 
trol, vol. 20, pp. 156, 2010 

[22] R. Carli, A. Chiuso, L. Schenato, S. Zampieri, "A probabilistic analysis of the 
average consensus algorithm with quantized communication," Proceedings of the 
1 7th World Congress The International Federation of Automatic Control, Seoul, 
South Korea, Jul 2008. 



130 



[23] R. Carli, A. Chiuso, L. Schenato, S. Zampieri, "Distributed Kalman filtering 
based on consensus strategies," IEEE Journal on Selected Areas in Communica- 
tions, vol. 26, pp. 622-633, 2008 

[24] G.C Calafiore, F. Abrate, "Distributed linear estimation over sensor networks," 
International Journal of Control, vol. 82, no. 5, pp. 868-882, 2009. 

[25] M. Cao, A. S. Morse, B. D. O. Anderson, "Coordination of an Asyn- 
chronous, Multi- Agent System via Averaging," Proceedings of the 16th IFAC 
World Congress, Prague, Czech Republic, Jul 2005. 

[26] B. Chazelle, "The Convergence of Bird Flocking," arXiv:0905.4241vl, May 2009. 

[27] B. Chazelle, "Natural Algorithms," Proceeding of the 20th Symposium on Dis- 
crete Algorithms, New York, USA, 2009. 

[28] R. Carli, F. BuUo, "Quantized coordination algorithms for rendezvous and de- 
ployment," SIAM Journal on Control and Optimization, vol. 48, no. 3, pp. 1251- 
1274, 2009. 

[29] H.-L. Choi, L. Brunet, J. P. How, "Consensus-based decentralized auctions for 
task assignment," IEEE Transactions on Robotics, vol. 25, no. 4, 2009. 

[30] S. Chatterjee and E. Seneta, "Towards consensus: Some convergence theorems 
on repeated averaging," Journal of Applied Probability, vol. 14, pp. 8997, 1977. 

[31] J. Cortes, "Finite-time convergent gradient flows with applications to network 
consensus, " Automatica, vol. 42, no. 11, pp. 1993-2000, 2006. 

[32] R. Carli, F. Fagnani, P. Frasca, T. Taylor, and S. Zampieri, "Average consensus 
on networks with transmission noise or quantization," Proceedings of European 
Control Conference, Kos, Greece, 2007. 

[33] I. Cidon, Y. Shavitt, "Message terminating algorithms for anonymous rings of 
unknown size," Information Processing Letters, vol. 54, no. 2, pp. 111-119, 1995. 

[34] F. Cucker. S. Smale, "Emergent behavior in flocks," IEEE Transactions on Au- 
tomatic Control, vol. 52, no. 5, pp. 852-862, 2007. 

[35] G. Cybenko, "Dynamic load balancing for distributed memory multiprocessors," 
Journal of Parallel and Distributed Computing, vol. 7, No. 2, pp. 279-301, 1989. 

[36] M. H. DeGroot, "Reaching a consensus," Journal of the American Statistical 
Association, vol. 69, no. 345, pp. 118121, 1974. 

[37] A. Dimakis, A. Sarwate, M.J. Wainwright, "Geographic gossip: efficient averag- 
ing on sensor networks, " IEEE Transactions on Signal Processing, vol. 56, no. 3, 
pp. 1205-1216, 2008. 



131 



[38] M. Draief, M. Vojnovic, "Convergence speed of binary interval consensus," Pro- 
ceedings of the Twenty-ninth IEEE Conference on Computer Communications , 
San Diego, USA, Mar 2010. 

[39] S. Del Favero, S. Zampieri, "Distributed Estimation through Randomized Gossip 
Kalman Filter," Proceedings of the Joint 48th IEEE Conference on Decision and 
Control and 28th Chinese Control Conference, Shanghai, China, Dec 2009. 

[40] P. Frasca, R. Carli, F. Fagnani, S. Zampieri, "Average consensus on networks 
with quantized communication," International Journal of Robust and Nonlinear 
Control, vol. 19, no. 6, pp. 1787-1816, 2009. 

[41] N.M. Freris, P.R. Kumar, "Fundamental limits on synchronization of affine clocks 
in networks," Proceedings of 46th IEEE Conference on Decision and Control, New 
Orleans, USA, Dec 2007. 

[42] F. Fich, E. Ruppert, "Hundreds of impossibility results for distributed comput- 
ing," Distributed Computing, vol. 16, pp. 121-163, 2003. 

[43] C. Gao, J. Cortes, F. Bullo, "Notes on averaging over acyclic digraphs and 
discrete coverage control," Automatica, vol. 44, no. 8, pp. 2120-2127, 2008. 

[44] Balazs Gerencser, "Potentials and Limitations to Speed up MCMC Methods Us- 
ing Non- Reversible Chains," Proceedings of the 19th International Symposium on 
Mathematical Theory of Networks and Systems (MTNS 2010), Budapest, Hun- 
gary, July 2010. 

[45] A. Giridhar, P.R. Kumar, "Computing and communicating functions over sensor 
networks," IEEE Journal on on Selected Areas in Communications, vol. 23, no. 4, 
pp. 755-764, 2005. 

[46] P. Gacs, G.L. Kurdyumov, L.A. Levin, "One-dimensional uniform arrays that 
wash out finite islands," Problemy Peredachi Informatsii, vol. 14, no. 3, pp. 92-96, 
1978. 

[47] B. Golub, M.J. Jackson, "Naive Learning in Social Networks and the Wisdom of 
Crowds," American Economic Journal: Microeconomics, vol. 2, no. 1, pp. 112-149, 
2010. 

[48] A.G. Greenberg, P. Flajolet, R. Lander, "Estimating the multiplicity of conflicts 
to speed their resolution in multiple access channels," Journal of the ACM, vol. 
34, no. 2, pp.289-325, 1987. 

[49] J. M. Hendrickx and V. D. Blondel, "Convergence of different linear and non- 
linear Vicsek models," Universite catholique de Louvain, CESAME research re- 
port, RR 2005.57, 2005. 

[50] J. M. Hendrickx, A. Olshevsky, J.N. Tsitsklis, "Distributed Anonymous Discrete 
Function Computation and Averaging," preprint, 2010. 



132 



[51] Y. Hassin, D. Peleg, "Distributed probabilistic polling and applications to pro- 
portionate agreement," Information and Computation, vol. 171, no. 2, pp. 248-268, 
2001. 

[52] A. Jadbabaie, J. Lin, and A. S. Morse, "Coordination of groups of mobile au- 
tonomous agents using nearest neighbor rules," IEEE Transactions on Automatic 
Control, vol. 48, no. 6, pp. 988-1001, 2003. 

[53] K. Jung, D. Shah, J. Shin, "Distributed averaging via lifted Markov chains, " 
preprint, 2008. 

[54] A. Kashyap, T. Basar, and R. Srikant, "Quantized consensus," Proceedings of 
the 45th IEEE Conference on Decision and Control, San Diego, USA, Dec 2006. 

[55] A. Kashyap, T. Basar, R. Srikant, "Quantized consensus," Automatica, vol. 43, 
no. 7, pp. 1192-1203, 2007. 

[56] N. Katenka, E. Levina, G. Michailidis, "Local vote decision fusion for target 
detection in wireless sensor networks," IEEE Transactions on Signal Processing, 
vol. 56, no. 1, pp. 329-338, 2008. 

[57] N. Khude, A. Kumar, A. Karnik, "Time and energy complexity of distributed 
computation in wireless sensor networks," Proceedings of Twenty-fourth IEEE 
International Conference on Computer Communications, Miami, USA, Mar 2005. 

[58] Y. Kim, M. Mesbahi, "On maximizing the second smallest eigenvalue of a state- 
dependent graph Laplacian," IEEE Transactions on Automatic Control, vol. 51, 
no. 1, pp. 116-120. Jan. 2006. 

[59] S. Kar, J.M Moura, "Distributed average consensus in sensor networks with 
quantized inter-sensor communication," IEEE International Conference on Acous- 
tics, Speech and Signal Processing, Las Vegas, USA, Apr 2008. 

[60] E. Kranakis, D. Krizanc, J. van den Berg, "Computing boolean functions on 
anonymous networks," Proceedings of the 17th International Colloquium on Au- 
tomata, Languages and Programming, Warwick, UK, Jul 1990. 

[61] N. Lynch, Distributed Algorithms, Morgan-Kauffman, 1996. 

[62] M. Land, R.K. Belew, "No perfect two-state cellular automaton for density clas- 
sification exists," Physical Review Letters, vol. 74, no. 25, pp. 5148-5150, 1995. 

[63] L. Liss, Y. Birk, R. Wolff, A. Schuster, "A local algorithm for ad hoc majority vot- 
ing via charge fusion," Proceedings ofThe 18th Annual Conference on Distributed 
Computing, Amsterdam, the Netherlands, Oct 2004. 

[64] D. Liberzon, A.S. Morse, "Basic problems in stability and design of switched 
systems," IEEE Control Systems Magazine, vol. 19, no. 5, pp. 59-70, 1999. 



133 



[65] Y. Lei, R. Srikant, G.E. Dullerud, "Distributed symmetric function computation 
in noisy wireless sensor networks," IEEE Transactions on Information Theory, 
vol. 53, no. 12, pp. 4826-4833, 2007. 

[66] H.J. Landau and A.M. Odlyzko, "Bounds for the eigenvalues of certain stochastic 
matrices," Linear Algebra and its Applications, no. 38, pp. 5-15, 1981. 

[67] S. Li, H. Wang, "Multi-agent coordination using nearest-neighbor rules: revisit- 
ing the Vicsek model", 2005, http://arxiv.org/abs/cs.MA/0407021 

[68] W. Liu, M.B. Short, Y.E. Taima, A.L. Bertozzi, "Multiscale Collaborative 
Searching Through Swarming," Proceedings of the 7th International Conference 
on Informatics in Control, Automation, and Robotics, Portugal, June 2010 

[69] F. Lekien, N.E. Leonard, "Nonuniform Coverage and Cartograms," SIAM Jour- 
nal Control and Optimization, vol. 48, no. 1, pp. 351-372, 2009. 

[70] J. Lorenz, D. A. Lorenz, "On Conditions for Convergence to Consensus," IEEE 
Transactions on Automatic Control, to appear. 

[71] S. Martinez, F. Bullo, J. Cortes, E. Frazzoli, "On synchronous robotic networks 
- Part I: models, tasks, complexity," IEEE Transactions on Robotics and Automa- 
tion, vol. 52, no. 12, pp. 2199-2213, 2007. 

[72] C. C. Moallemi and B. Van Roy, "Consensus Propagation," IEEE Transactions 
on Information Theory, vol. 52, no. 11, pp. 4753-4766, 2005. 

[73] L. Moreau, "Stability of multi-agent systems with time-dependent communica- 
tion links," IEEE Transactions on Automatic Control, vol. 50, no. 2, pp. 169-182, 
2005. 

[74] S. Mukherjee, H. Kargupta, "Distributed probabilistic inferencing in sensor net- 
works using variational approximation," Journal of Parallel and Distributed Com- 
puting, vol. 68, no. 1, pp.78-92, 2008. 

[75] S. Moran, M.K. Warmuth, "Gap theorems for distributed computation," SIAM 
Journal of Computing vol. 22, no. 2, pp. 379-394, 1993. 

[76] A. Nedic, A. Olshevsky, A. Ozdaglar, and J. N. Tsitsiklis. "On distributed av- 
eraging algorithms and quantization effects," IEEE Transactions on Automatic 
Control, vol. 54, no. 11, pp. 2506-2517, 2009. 

[77] A. Olshevsky, "Convergence speed in distributed consensus and averaging," M.S. 
Thesis, Department of EECS, MIT. 

[78] A. Olshevsky, "On the Nonexistence of Quadratic Lyapunov Functions for Con- 
sensus Algorithms, " IEEE Transactions on Automatic Control, vol. 53, no. 11, 
pp. 2642-2645, 2008. 



134 



79 



80 



82 



83 



84 



85 



86 



89 

90 
91 



A. Olshevsky, J.N. Tsitsiklis, "A Lower Bound on Distributed Averaging," 
preprint. 

B. Oreshkin, M. Coates, and M. Rabbat, "Optimization and analysis of dis- 
tributed averaging with short node memory," IEEE Transactions on Signal Pro- 
cessing, in press. 

R. Olfati-Saber, "Distributed Kalman Filter with Embedded Consensus Filters," 
Proceedings of the 44th IEEE Conference on Decision and Control and European 
Control Conference, Seville, Spain, Dec 2005. 

R. Olfati-Saber, "Distributed Kalman Filtering for Sensor Networks," Proc. of 
the 46th IEEE Conference on Decision and Control, New Orleans, USA, Dec 2007 

R. Olfati-Saber, J. A. Fax, and R. M. Murray, "Consensus and Cooperation in 
Networked Multi- Agent Systems," Proceedings of the IEEE, vol. 95, no. 1, pp. 
215-233, Jan. 2007. 

R. Olfati-Saber and R. M. Murray, "Consensus problems in networks of agents 
with switching topology and time-delays," IEEE Transactions on Automatic Con- 
trol, vol. 49, No. 9, pp. 1520-1533, 2004. 

A. Olshevsky, J.N. Tsitsiklis, "Convergence Speed in Distributed Consensus and 
Averaging," SI AM Journal on Control and Optimization, vol. 48, no. 1, pp. 33-55, 
2009. 

E. Perron, D. Vasuvedan, M. Vojnovic, "Using three states for binary consensus 
on complete graphs," preprint, 2008. 

L. Schenato, G. Gamba, "A distributed consensus protocol for clock synchroniza- 
tion in wireless sensor network," Proc. of the 46th IEEE Conference on Decision 
and Control, New Orleans, USA, Dec 2007 

S. Sundaram, C.N. Hadjicostis, "Finite-time distributed consensus in graphs with 
time-invariant topologies," Proceedings of the American Control Conference, New 
York, NY, July 2007. 

M. Schwager, J. -J. Slotine, D. Rus, "Consensus Learning for Distributed Cover- 
age Control," Proceedings of International Conference on Robotics and Automa- 
tion, Pasadena, CA, May, 2008 



S. Treil, Linear Algebra Done VFrong"; http : //www. math. brown. edu/~treil/papers/LADW/LADW.] 

J. N. Tsitsiklis, "Problems in decentralized decision making and computation, 
Ph.D. dissertation. Dept. Elect. Eng. Comput. Sci., Mass. Inst. Technol., Cam- 
bridge, MA, 1984 [Online]. Available: http://hdl.handle.net/1721. 1/15254 

[92] J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, "Distributed asynchronous de- 
terministic and stochastic gradient optimization algorithms," IEEE Transactions 
on Automatic Control, vol 31, no. 9, pp. 803812, Sep. 1986. 



135 



[93] K. Tsianos and M. Rabbat, "Fast decentralized averaging via multi-scale gossip," 
Proc. IEEE Distributed Computing in Sensor Systems, Santa Barbara, June, 2010. 

[94] T. Vicsek, A. Czirok, E. Ben- Jacob, I. Cohen, and O. Schochet, "Novel type of 
phase transitions in a system of self-driven particles," Physical Review Letters, 
vol. 75, no. 6, pp. 12261229, Aug. 1995. 

[95] J. Wolfowitz, "Products of indecomposable, aperiodic, stochastic matrices," Pro- 
ceedings of the American Mathematical Society, bol. 15, pp. 733-737, 1963. 

[96] T. Wongpiromsarn, K. You, and L. Xie, "A Consensus Approach to the Assign- 
ment Problem: Application to Mobile Sensor Dispatch, " preprint. 

[97] L. Xiao and S. Boyd, "Fast linear iterations for distributed averaging." Systems 
and Control Letters, no. 53, pp. 65-78, 2004. 

[98] L. Xiao, S. Boyd, S. Lall, "A Scheme for Robust Distributed Sensor Fusion Based 
on Average Consensus," Proceedgins of International Conference on Information 
Processing in Sensor Networks, Los Angeles, USA, Apr 2005 

[99] L. Xiao, S. Boyd, S. Lall, "A Space-Time Diffusion Scheme for Peer-to-Peer 
Least-Squares Estimation," Proceedings of Fifth International Conference on In- 
formation Processing in Sensor Networks, Nashville, TN, Apr 2006. 

[100] L. Xiao, S. Boyd, and S.-J. Kim, "Distributed average consensus with least- 
mean-square deviation," Journal of Parallel and Distributed Computing, no. 67, 
pp. 33-46, 20087. 

[101] G. Xiong, S. Kishore, "Analysis of Distributed Consensus Time Synchroniza- 
tion with Gaussian Delay over Wireless Sensor Networks," EURASIP Journal on 
Wireless Communications and Networking, 2009. 

[102] M. Yamashita, T. Kameda, "Computing on an anonymous network," IEEE 
transactions on parallel and distributed systems, vol. 7, no. 1, pp. 69-89, 1996. 

[103] M. Zhu, S. Martinez, "On the convergence time of asynchronous distributed 
quantized averaging algorithms," preprint, 2008. 



136 



